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This book contains the collected and unified material necessary 
for the presentation of such branches of modern cybernetics as the 
theory of electronic digital computers, theory of discrete automata, 
theory of discrete self-organizing systems, automation of thought 
processes, theory of image recognition, etc. Discussions are given 
of the fundamentals of the theory of boolean functions, algorithm 
theory, principles of the design of electronic digital computers and 
universal algorithmical languages, fundamentals of perceptron theory, 
some theoretical questions of the theory of self-organizing systems. 

Many fundamental results in mathematical logic and algorithm 
theory are presented in summary form, without detailed proofs, and 
in some cases without any proof. 

The book is intended for a broad audience of mathematicians and 
scientists of many specialties who wish to acquaint themselves with 
the problems of modern cybernetics. 
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FOREWORD 
The objective of the present book is to acquaint the reader with 
several new scientific directions which constitute the basis of cyber- 
netics in its modern concept. In the most general framework all these 
trends can be subdivided into two major groups — the general theory of 


information conversion, end the theory and principles of the design of 





various kinds of information converters. However, the material which 
can be associated with these major trends is so extensive that it 
couid hardly be presented even in summary form in a single book. 
Therefore it has been necessary to make a selection of the material in 
accordance with some genera. principles. 

The material for the present book has been selected in accordance 
with two basic principles. The first principle is the requirement for 
a sufficiently rigorous formulation of the material to permit present- 
ing it in the form of a mathematic theory (although with the bent in 
the direction of practical simulation which is characteristic of cy- 
bernetics). The second principle is that the author limits himself, as 
a rule, to the digital methods of representing information and the dig- 
ital conversion of information. 

As a result of the selection, the book contains the following 
basic sections: algorithm theory (including programming for general 
purpose electronic digital computers and universal algorithmic lan- 
guages for programming) , theory of discrete automata (including the ® 
theory of boolean functions and the concept of the principles of the 
design of general-purpose electronic digital computers), theory of 
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discrete self-organizing systems (including elements of the theory of 
optimal decisions) and, finally, mathematical logic (propositional cal- 


culus, restricted predicate calculus and formal arithmetic), consid- 
ered as a basis for the automation of the process of the design of 
design of deductive (based on a particular system of axioms) theories. 

The degree of detail of the presentation of the material is deter- 
mined first of all by the degree of its novelty. The newer branches, 
related to cybernetics itself, are discussed in greater detail, the 
fundamental theorems are supplied with quite detailed proofs. At the 
same time, in such branches as abstract algorithm theory and mathemat- 
ical logic, which have developed within the framework of traditional 
mathematics, the material is presented more briefly, proofs, as a rule, 
are omitted. 

The author has attempted, however, to give an understanding of 
the basic ideas and methods which are used to establish the validity 
of such fundamental, from the point of view of mathematic logics, prop- 
ositions as the Godel theorem on the incompleteness of arithmetic or 
the theorems which establish the algorithmic insoiubility of particu- 
lar problems. 

The book does not pretend to replace specialized mcnographs on 
the individual sections which are included here. Its primary intention 
is to aid a wide audience of mathematicians and engineers to master 
that minimum of knowledge which is necessary for work in the field of 
the theoretical problems of modern "digital" cybernetics. It is well 
known that the existence of detailed monographs on a particular theme 
does not always make it possible for readers without specialized pre- 
paration to become acquainted with the subject. Convincing proof of 
this is the fact that in spite of the existence of syvecialized mono- 
graphs, such a theorem as that of Godel mentioned above, which is of 
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fundamental importance for all of mathematics, remains unknown to 
large numbers of mathematicians except for hearsay. 

As for the present book, it presents to the reader (but only in 
one chapter, the fourth), the knowledge of only those elements of math- 
ematical analysis and probability theory which are known to practical- f 
ly every engineer, without mentioning mathematicians. The less widely 
known mathematical results necessary for the understanding of the main 
content of the book are included as supplementary material. An example 
of this sort of supplementary material might be the series of proposi- 
tions of probability theory presented in Chapter 4, §2. 

In case the reader wishes to extend his knowledge in a particular 
area or become acquainted with the detailed proofs of those proposi- 
tions which, although included in the book, are not proved in detail, 
we shall make a summary of the contents of the book with an indication 
of the specialized monographs (in Russian) pertaining to the individu- 
al sections. Unfortunately, this sort of monograph cannot be found per- 
taining to all the sections of the book. 

The first chapter presents a description of the basic theoretical . 
universal algorithmic systems (normal Markov algorithms, the Kolmogorov- 
Uspenskily algorithmic system, recursive functions, the Post algorithms, 
and the Turing machine). Also presented are the basic principles of 
the proofs of the algorithmic insolubility of certain very simple mass 
problems. 

At the present time there is no unifying monograph available on 
the intire theory of algorithms as 4 whole. Moreover, not all the ques- 
tions mentioned above are covered in any detail in the monographic lit- 
erature. Among the principal monographs on the individual algorithmic 


systems we might mention the following: on the theory of normal algo- 


rithms, Theory of Algorithms, A.A. Markov (Ref 53); on the theory of 
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recursive functions and Turing machines, Introduction to Metamathemat- 
ics, S.C. Kleene (Ref 42) and Course on Computable Functions, V.A. Us- 
penskiy (Ref 76). 

The theory of boolean functions and its applications to the the- 
ry of discrete automata circuits are presented in the second chapter. 
These questions are discussed in greater detail in the morograph of 
V.M. Glushkov, Synthesis of Digital Automata (Ref 26). 

In addition, the second chapter covers the fundamentals of propo- 
sitional theory. More detail on propositional calculus can be found, 
for example, in the monograph of P.S. Novikova, Elements of Mathemati- 
cal Logic (Ref 61). 

The third chapter is devoted to the abstract and structural theo- 
ry of discrete (finite) automata. The questions relating to this sub- 
ject are considered in more detail in the monograph of Glushkov 
mentioned above. These questions are covered from somewhat different 
positions in the monograph of N.Ye. Kobrinskiy and V.A. Trakhtenbrot, 
Introduction to the Theory of Finite Automata (Ref 47). 

The fundamentals of the theory of discrete self-organizing sys- 
tems are presented in the fourth chapter. A definition is given of the 
quantitative measure of self-organization and self-learning, a study 
is made of the behaviour of random automata and automata operating in 
conditions of random external inputs. Special attention is devoted to 
the problem of the recognition of images and the theory of one class 
of devices (the so-called a-perceptron) intended for the resolution of 
thir problem. Some questions of the simulation of conditioned reflexes 
are considered, and also questions of the teaching of meaning recogni- 
tion and the generation of new concepts. At the end of the chapter, in 
connection with the idea of self-adjustment and extremal regulation, 
descrptions are given of several general methods for the solution of 
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extremal problems (the method of steepest descent and its refinement, 
the simplex method of solution of the problems of linear programming 
and the so-called method of sequential analysis of variants for the 
solution of the problems of dynamic programming). 

So far no unifying monogyvaph is avialable on the material of the 
fourth chapter. Moreover, almost all the questions discussed in this 
chapter (with the exception of the method of steepest descent and the 4 
simplex method) have not yet been covered in the monographic litera- 
ture. Several questions allied with those considered in this chapter 
(but not completely identical to them) are covered in Neurodynamics, 
F. Rosenblatt which has not yet been translated into Russian. A large 
number of monographs is devoted to the methods of solution of experi- 
mental problems (with the exception of the method of sequential analy- 
sis of variants). However, we shall not list them here since these 
questions have no direct relation to the primary theme of the present 
book. 

The fifth chapter covers the basic principles of the design of 
the general-purpose electronic digital computers and the programming ° 
for these machines. So many monographs have been devoted to this ques- 
tion that it would be very difficult to list them all. In particular, 
we might cite on the subject of programming the monograph of B.V. 
Gnedenko, V.S. Korolyuk and Ye.L. Yushchenko, Elements of Programming 
(Ref 31). As for the principles of computer design, in spite of the 
existence of many good specialized monographs on this question, a de- 
tailed presentation of the material in the framework we need does not 
exist; the principles of the design of the electronic digital compu- 
ters are presented, as a rule, in isolation from the general theory of ’ 
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In addition, the fifth chapter presents a detailed description of 
the universal algorithmic language ALGOL-60 and gives examples of 
ALGOL programming of various problems, primarily from the theory of 
self-organizing systems. In particular, a discussion is given of the 
question of the programming of the perceptron learning process and of 
a Simplified model of the process of biological evolution. Again, on 
this question there is little information in the monographic litera- 
ture: Report on the Algorithmic Language ALGOL-60 (edited by P. Naur), 
published by the Computer Center of the USSR Academy of Sciences (Mos- 
cow, 1960) is of a reference nature and not suitable for paractical in- 
struction on the ALGOL language. 

In the last (sixth) chapter there is given a summary exposition 
of the fundamentals of the restricted predicate calculus (including 
the formal system of Gentzen) and of formal arithmetic (including the 
Godel theory on arithmetic incompleteness). Detailed proofs of the 
propositions presented can be found in the previously cited monographs 
of Kleene and Novikov. This chapter also contains elements of the auto- 
mation of proofs and formulations of theorems in deductive theories. 
The questions touched on here have not yet been covered in the mono- 
graphic literature. 

As indicated by the list of the material presented in the book, 
several interesting branches of modern cybernetics are not included in 
the book, Considering the criteria mentioned previously for the selec- 
tion of material, we could, for example, include a presentation of the 
fundamentals of mathematical linguistics or elements of game theory. 
However, even without this, the considerable size of the book has 
forced the author to refrain from attempts to include any additional 
material. At the same time, the contents of the book do encompass 
those questions which at the present time as usually considered the 
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basis of theoretical cybernetics (with account for limiting ourselves 
to discretc methods). The author hopes, therefore, that the book will 
be of assistance in mastering the mathematical apparatus of cybernet- 
ics and preparing for work in the theoretical fields to individuals 
occupled in individual applied aspects of cybernetics and also to the 
individuals interested in the theoretical problems of cybernetics. 

In the present book extensive use has been made of material from 
courses on the various branches of cybernetics and mathematical logic 
presented b y the author at Kiev University and at the Kiev House of 
Scltentific and Technical Propaganda in 1959-1962. A part of this mate- 
rial (theory of algoriths, for example) has been published previously 
for service use. The present bovk can be considered to be the first 
sufficiently complet2 textbook for students of the branches of cyber- 


netics mentioned above. 








Chapter 1 
ABSTRACT THEORY OF AUTOMATA 
§1. ALPHABETIC OPERATORS AND ALGORITHMS 

In modern mathematics it is customary to call the structurally 
specified correspondcnces between words in abstract alphabets algo- 
rithms. 

Any finite ensemble of objects, termed the letters of a given al- 
phabet, is called an abstract alphabet. The nature of these objects is 
a matter of complete indifference to us. For example, the letters of 
the alphabet of any language (Russian, Latin, Greek, etc.), digits, 
any Symbols, figures, etc., can be considered to be letters of ab- 
stract alphabets. If we wish to, we can introduce an abstract alphabet 
whose letters will be considered to be entire words of any particular 
language (Russian, for example). It is important only that the alpha- 
bet considered be finite, i.e. that it consist of a finite number of 
letters. 

Introducing the concept of an (abstract) alphabet, we define a 
word in this alphabet as any finite ordered sequence of letters. For 
example, in the alphabet A = A(x,y) consisting of the two letters x 
and y we consider any sequence x, y, XX, X¥s YX, YVs XXX, .«.. to be 
words. The number of letters in a work is termed normally the length 
of this word, so that the words we just listed in the alphabet have re- 
spectively the lengths 1, 1, 2, 2, 2, 2, 3,... 

Along with words of positive length (consisting of no less than 


one letter), in many cases it is convenient to consider also an empty 
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word, not containing even one letter..In the present chapter use is 
made of the small Latin letter e to designate an empty word. Sometimes, 
however, it is convenient to designate the empty word in complete ac- 
cordance with its definition, not writing any letter in the place core 
responding to this word. 

We note that, with the accepted definition, the concept of a word 
in the Russian alphabet will differ from the concept of a word as ac- 
cepted in ordinary language. With our definition, words are to be con- 
sidered any combination of letters, including meaningless combinations: 
the combinations of letters "algorithm", "mathematics", "'k1t", "dddd" 
must to an equal degree be considered words of the Russian alphabet 
(considered as an abstract alphabet). 

With expansion of an alphabet, 1.e., with inclusion in its compo- 
sition of new letters, the concept of the word may undergo significant 
changes. If, for example, we expand the Russian alphabet by the "let- 
ters" (" " — parentheses) and (, — comma), then the four words 
which we have just written out in the Russian alphabet can be consid- 
ered aS a single word in the alphabet expanded in this fashion. By com- 
plementing the Russian alphabet with the punctuation marks and the 
separation mark (empty space left between two neighboring words), we 
can if we wish consider entire phrases, paragraphs and even entire 
books as individual words. 

In just the same way, the expression 69 + 72, which is two words 
(69 and 72) in the alphabet A of the 10 digits (0,1,2,3,4,5,6,7,8,9);, 
joined by the sum sign, can be considered as a single work in the ex- 
panded alphabet A which is obtained as the result of joining to it the 
new letter "+" (sum sign). 

Alphabetic operator or alphabetic representation is the term giv- 
en to any correspondence (function) which associates words in a 
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particular alphabet to words in the same or another fixed alphabet. 

The first alphabet is here termed the input, and the second the out- 
put alphabet of the given operator. In the case of coincidence of the 
input and output alphabets, we say that the alphabetic operator is giv- 
en in the corresponding alphabet. 

Hereafter we consider primarily single-valued alphabetic opera- 
tors, associating to each input word (word in the input alshabet of 
the operator) no more than one output word (word in the output alpha- 
bet of the operator). If the alphabetic operator does not associated 
with a given input word p any output word (including an empty word) , 
then we say that it is not defined on this word. The ensemble of all 
words on which an alphabetic Operarer is defined is termed its domain 
of definition. 

On the basis of the foregoing, in the future we shall always un- 
derstand (if not otherwise specified) by the term "alphabetic operator" 
a unique, generally speaking, partially defined mapping of a set of 
words in the input alphabet of the operator into a set of words in its 
output alphabet. 

Thanks to the possibility of specifying the alphabetic operators 
on less than all the words, we can, without loss of generality, every 
time consider that the input and output alphabets of the operator coin- 
cide. For this it is sufficient, clearly, to combine the input and out- 
put alphabets of the given operator g into one common alphabet A and 
to consider the operator p as an operator in this combined alphabet, 
specified only on those words which appeared in the primitive region 
of definition of the operator 9. 

With each alphabetic operator there is associated an intuitive 
concept on its complexity. The simplest operators are those which per- 

form letter-by-letter mapping. This mapping consists in each letter x 
ies: ee 
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of the input word p being replaced by some letter y of the output al- 
phabet operator, depending only on the letter x and not on the choice 
of the input word p. Letter-wise mapping is completely defined by spec- 
ifying the correspondence between the letters of the input and output 
alphabets. 

The so-called coding transformations, which for brevity we shall 
term simply codings, are of great importance for the later discussion. 
In the simplest case the words in one alphabet, say in alphabet A, are 
coded by words in the other alphabet, B, as follows: to each letter ay 
of the alphabet A there is associated some finite sequence or ee Pa 
ie ae: of letters in the alphabet B, called the code of the correspond- 
ing letter, such that to the different letters of the alphabet A there 
are associated different codes. 

For the construction of the desired coding transformation it is 
sufficilent now to replace all the letters of any word p in the alpha- 
bet A by the codes corresponding to them. The word thus obtained in 
the alphabet B we term the code of the orlginal word p. We stipulate 
that the coding transformation must necessarily be reversible. In oth- 
er words, different words in alphabet A must have different codes. The 
condition of reversibility of the coding is nothing other than the 
condition of mutual uniqueness of the corresponding coding transforma- 
tion. 

It is easy to see that reversibility of the coding is not ensured 
by the single condition that the codes of the various letters (words 
of length 1) be different. Actually, if to the letter ay there is as- 
sociated the code bb, and to the letter a, the code b, then the code 
bbb will clearly correspond both to the word Qa, and to the words 


ana, and An ano. 
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It is not difficult to verify that the coding will be reversible 
whenever the following two conditions are fulfilled: 

a) the codes of the different letters of the original alphabet A 
are different; 

b) the code of any letter of the alphabet A cannot coincide with 
any of the initial segments of the codes of the other letters of this 
alphabet. * 

Actually, let us assume that both of these conditions are satis- 
fied and let the word q = Bian EA be the code of some word p = 
er Peace in the alphabet A. Let us show that from the code g we 
can uniquely recover the word p. In view of condition b) only one ini- 
tlal segment of the word g can coincide with the code of any letter of 
the alphabet A. It is clear that the code of the letter i is such a 
segment. Discarding this segment, we obtain the code qy) of the word 
Pi = oo aon Ay Applying to it the same reasoning, we restore uniquely 
the following letter (a, ) of the word p, and so on, Using this tech- 
nique, all the letters of the word p are restored one after the other. 
Consequently, to any given code there can correspond only one word in 
the alphabet A, which proves the reversibility (mutual uniqueness) of 
the coding transformation. 

Condition b) is satisfied if the codes of all the letters of the 
original alphabet have identical length. By convention we call the cod- 
ing in this case normal. Use of coding permits reducing the study of 
arbitrary alphabetic transformations to alphabetic transformations in 
some once-and-for-all selected standard alphabet. Most frequently, as 
such a standard alphabet there is chosen the so-called binary alphabet, 
consisting of two letters which are usually identified with the digits 
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Let A be an arbitrary alphabet and B be a standard alphabet (bi- 
nary, for example) consisting of more than one letter. If n is the num- 
ber of letters in alphabet A and mis the number of letters in 
alphabet B, then we can always select the number k so as to satisfy 
the inequality 

m* > n. (1) 

Since the number of different words of length k in the m-letter 
alphabet is clearly equal to m*, then inequality (1) shows that we can 
code all the letters in alphabet A with words of length k in alphabet 
B so that the codes of the different letters are different. Any such 
coding will be normal and will generate, in light of what was said a- 
bove, a reversible coding transformation of the words in alphabet A 
into words in alphabet B. We designate this transformation by a and 
use q7t to designate the reverse transformation which transforms each 
word g in the alphabet B, which is the code of some word p in alphabet 
A, into the word p. 

Now if » is an arbitrary alphabetic operator in alphabet A, then 
the transformation y = antoa obtained as the result of sequential per- 
formance of the transformations a7}, m and a will be, obviously, some 
alphabetic operator in the standard alphabet B. We term this operator 
the alphabetic operator in the alphabet B, conjugate (with the aid of 
the a coding) with the alphabetic operator 9. 

The operator go is uniquely recovered from the conjugate operator 
y and the corresponding coding transformation a 

@ =aya’. (2) 

With the aid of this equation, and also its dual equation which 

was written previously 


y =~ 'pa (3) 


ee a 








the arbitrary alphabetic operators are reduced to alphabetic operators 
in the standard alphabet. This reduction, of course, can be performed 
by an infinite number of different methods, since there exist infinite- 
ly many different codings of words in any given alphabet by words in 
the standard alphabet. 

The described reduction can also be accomplished in the case of 
alphabetic operators for which the input and output alphabets are dif- 
ferent. For example, let @ be an arbitrary alphabetic operator with 
the input alphabet A and the output alphabet C, let B be the standard 
alphabet, let a be any (reversible) coding of words in the alphabet A 
by words in the standard alphabet, and let y be an analogous coding of 
the words in alphabet C. 

Now it 1s easy to see that the transformation y = a7? 4s an al- 
phabetic operator in the standard alphabet B by which under the condi- 
tion of knowing the coding transformations a and y the original trans- 
formation » is uniquely restored. 

The concept of the alphabetic operator is extremely general. Actu- 
ally any processes of information conversion reduce to it or can be in 
some sense reduced to it. Here and in the future, by information we 
shall understand not only intelligent communications but in general 
any information on processes and states of any nature which can be de-~ 
tected by the sense organs of man or by instruments. 

For certain specialized forms of information, for example infor- 
mation which is lexical or numerical, the alphabetic method of specifi- 
cation is the most natural and is constantly used. The transformations 
of these forms of information are reduced to the alphabetic operators 
in the most indirect fashion: both the input and the output informa- 
tion in any information converter in this case can be represented in 
the form of words, and the conversion of the information reduces to 
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the establishment of some correspondence between the words. We recall 
that with rational expension of the alphabet with words, account can 
be taken in the lexical information not only of ordinary words, but 
also entire sentences and even any sequences of sentences. 

One of the characteristic tasks of the conversion of lexical in- 
formation is the translation of texts from one language bo enotner: It 
is well known that the translation problem does not reduce to the prob- 
lem of cstablishing the somiespontenee between the words of the lan- 
guages which are involved in the translation. If, however, we consider 
as words the entire books or at least individual sections of the book, 
then the problem of translation completely reduces to the problem of 
establishing correspondence between such generalized words. Thus, the 
problem of translation from one language to another can be treated as 
the process of the realization of some alphabetic operator. 

It is worthy of note, moreover, that quite high-quality and gram- 
matical translation permits, as is known, the possibility of known mod- 
ifications of the translated text. Therefore the process of transla- 
tion is described, not by the usual single-valued alphabetic operator, 
but by a multi-valued, or so-called probabilistic, alphabetic operator. 
Such an operator associates with each input word from the region of 
its definition not a single output word, but a whole ensemble of out- 
put words. In the specific application of this operator to a particu- 
lar input word p there is a random selection of the output word from 
the ensemble of output words corresponding to the word p. 

In addition to the alphabetic operators for the translation from 
one language to another, we can construct alphabetic operators which 
resolve other problems of the conversion of lexical information, for 
example the problem of editing texts in a particular language, the 
problem of composing abstracts of articles, etc. It is net difficult 
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to expand the field of application of the alphabetic operators, using 
the alphabetic representation not only for lexical information but al- 
so for other forms of information. For example, using the known tech- 
niques of chess notation, we can write chess positions in the form of 
words consisting of the letters of the Russian and Latin alphabets, 
digits, and punctuation marks (comma). In this case the process of the 
chess game can be interpreted as the process of establishing the cor- 
respondence between any given position and the position resulting from 
it after performing the next move. Thus, again in this case we are 
dealing with an alphabetic operator (probabilistic, generally speaking). 

Similarly, it is not difficult to represent in the form of proces- 
ses or realization of the alphabetic operators many other processes of 
information conversion, for example the orchestration of melodies, the 
solution of mathematical problems, the problem of production planning, 
etc. 

It may seem at first that for the characterization of the conver- 
sion of continuous information (for example, visual or random auditory 
sensations) the concept of the alphabetic operator is insufficient. 
However this is not so, or more precisely, not entirely so. 

The reception and conversion of continuous information is always 
accomplished with the aid of noni.deal instruments which do not react 
to extremely small variations of the characteristics of the informa- 
tion being converted. In real instruments, detecting and converting 
continuous information, there always exist several limitations which 
make it possible to consider this information as alphabetic informa- 
tion. For greater clarity, let us consider visual information (the 
same phenomena occur with the other forms of specifying continuous in- 


formation). 
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The first limitation is that of the resolving power of the instru- 
ment which receives the information. This limitation leads to the situ- 
ation where sufficiently closely spaced points of the portion of space 
on which the information in question is distributed (for example, a 
picture or drawing) is sensed by the instrument (say, the human eye) 
as a Single point. This implies the possibility of considering this in- 
formation as information given, not at an infinite number of points, 
but only at a finite number of points. 

The second limitation is associated with the limited sensitivity 
of the instrument receiving the information. This limitation leads to 
the instrument being able to distinguish only a finite number of lev- 
els of the quantity carrying the information (for example, the bright- 
ness of individual points of a drawing). 

On the basis of the described limitations we come to the conclu- 
sion that the instrument, as a result of its nonideal nature, can at 
each given instant sense only one pattern of a finite (and not infin- 
ite as it might seem without account for the limitations indicated) 
number of different patterns of the instantaneous spatial distribution 
of the information in question. 

Introducing for each such pattern a special literal notation, we 
come to the finite alphabet A which with account for the indicated lim- 
itations is completely adequate for the characterization of the infor- 
mation arriving at the input of the instrument (nonideal) which we are 
considering at every given instant of time. If we denote by the letter 
r. the number of spatial points sensed by the instrument as individual 
points, and by the letter m the number of levels of the physical quan- 
tity carrying the information which are distinguished by the instru- 
ment, then the number of letters in the alphabet A will be equal, it 
is easy to see, to m™ (for simplicity we assume the number of levels 
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which are distinguishable by the instrument to be identical for all 
points of the space). 

Of course, the number of letters in the alphabet A which we have 
just estimated may be found to be excessively large (in the case of 
the reception of visual information by the human eye it may be esti- 
mated as a one with several thousand zeros following it). Nevertheless 
it is still finite, and from the abstract theoretical point of view 
the essential thing is only whether the alphabet A is finite or infin- 
ite. 

Continuing our investigation, we note that every real instrument 
which receives and converts information has, along with the two limita- 
tions indicated, a third limitation. Here we are dealing with the lim- 
ited passband of the instrument, which does not permit it to differcen- 
tiate excessively rapid changes of the received quantities. In view of 
the familiar Kotel'nikov principle (Ref 46), the limitation of the 
pass band is equivalent to the introduction during the information 
transmission in place of the usual continuous time a conditional dis- 
crete time, neighboring instants of which differ from one another by 
quite definite (although usually very small) segments of time. Roughly 
speaking, as such an elementary segment of time we select the maximal 
segment in the course of which the instrument in question is incapable 
of differentiating the variations of the quantity carrying the informa- 
tion. 

After the introduction of this descrete time, the information re- 
ceived by our instrument after any finite segment of time t naturally 
is represented in the form of a word in the previously introduced al- 
phabet A. The number of letters in this word is equal to the number of 
instants Typos Ty of the discrete time located in the given time ser- 
ment t, and its i-th letter (1 = 1,2,...,k) 1s the information 
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received by the instrument at the instant of time Ts, expressed in the 
form of a letter of the alphabet A. 

Since analogous considerations are applicable not only to the in- 
put information but also to the output information, any real informa- 
tion converter must be considered (with account for the limitations 
indicated above) as an instrument realizing some alphabetic operator. 
The alphabetic operator realized by the instrument completely (with 
an accuracy to the information coding) determines the informational es- 
sence of this instrument, in other words the information conversion 
performed by this instrument. 

Thus, we have established the extremely great generality of the 
concept of the alphabetic operator. Actually the theory of any informa- 
tion converter was found to reduce to the study of the alphabetic oper- 
ators. And man encounters information converters literally at every 
step of his practical existence. The various instruments and devices 
for automatic control are information converters. Finally, one of the 
must important and essential aspects of the study of the activity of 
man himself is the aspect associated with consideration of man as a 
very complex and highly-perfected information converter. All this 
makes it possible to consider the theory of the alphabetic operators 
one of the most important component parts of cybernetics. 

The basis of the theory of the alphabetic operators are the meth- 
ods of representing them. In the case when the region of definition of 
definition of the alphabetic operator is finite the question of its 
representation, at least in the theoretical sense, is resolved very 
simply: the operator can be represented by a simple correspondence ta- 
ble. In the left side of such a table we write out all the words ap- 
pearing in the region of definition of the operator in question, and 
in the right side we write the output words obtained as the result of 
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the application of the operator to each word from the left side of the 
table. 

Of course, if the region of the definition of the alphabetic oper-~ 
ator is sufficiently large, this method of representation can become 
excessively cumbersome and therefore not applicable in practice. How- 
ever, for the moment we shall not take such considerations into acount, 
limiting ourselves to the establishment only of the theoretical possi- 
bility of representing particular alphabetic operators. 

In the case of an infinite region of definition of the alphabetic 
operator, its representation with the aid of a simple correspondence 
table becomes impossible in principle, since man does not have at his 
disposal the means to permit him to actually write out or perceive an 
infinite set of words. However, it is well known that man long ago 
learned to represent operators on infinite sets of words without writ- 
ing out the entire correspondence tables. For this purpose it is suf- 
ficient to consider, for example, the alphabetic operator represented 


by the formula 
xX... X—>yy...y (n=1,2,...). (4) 
n times n+l times 


This formula defines the correspondence on an infinite set of 
words, achieved without actually writing out the entire correspondence 
table (which, of course, in this case cannot be done). In place of the 
correspondence table itself, this formula gives a rule with the aid of 
which, after a finite number of steps, there can be established the 
output word corresponding to any prescribed input word from the re~ 
geion of definition of the alphabetic operator being considered. 

An analogous situation arises every time we need to represent an 
alphabetic operator with an infinite region of definition; in place of 


the correspondence table itself there is given a finite number of 
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rules permitting after a finite number of steps the finding of the pre- 
scribed line of this table (the value of the alphabetic operator on 
any input word appearing in the region of its definition). 
Nlphabetic operators represented with the aid of finite systcms 
of rules are customarily termed algorithms. 


On the basis of the discussion above, we can easily understand 
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that every alphabetic operator which can actually be represented is of 
necessity an algorithm. In particular, all alphabetic operators with 
finite regions of definition represented by (finite) correspondence ta- 
bles will be algorithms. Formula (4) also represents an algorithm. 

It is not difficult to construct other examples of algorithms. 
Associating with each whole positive number its square, we obtain an 
alphabetic op.’ ator in the alphabet consisting of all the digits of 
the number system used for the representation of these numbers. Since 
the rules for squaring make it possible after a finite number of steps 
to obtain the square of any prescribed whole number, this operator can 
be considered as an algorithm. 

All the specific alphabetic operators considered in the present 
chapter (including the operators for translation from one language to 
another, chess moves, etc.) also can be represented with the aid of 
finite systems of rules and can, consequently, be considered as algo- 
rithms. 

We must emphasize one distinction existing between the concepts 
of the alphabetic operator and the algorithm. In the concept of the al- 
phabetic operator only the correspondence itself, established by the 
operator between the input and output words, is of essence, and not 
the method by which this correspondence is established. In the concept 
of the algorithm, on the other hand, the primary emphasis is placed on 
the method of representation of the correspondence established by the 

oop te 





~8 


algorithm. Thus, the algorithm is nothing other than an alphabetic op- 
erator together with the rules defining its operation. 

The concept of equality for the alphabetic operators and algo- 
rithms is defined in accordance with the foregoing. Two alphabetic op- 
erators are considered equivalent if they have the same region of def- 
inition and associate with any prescribed input word from this region 
identical output words. The concept of equality for algorithms in- 
cludes the conditions of equality for the corresponding operators, but 
also provides for coincidence of the systems of rules which represent 
the operation of these algorithms on the input words. The algorithms 
for which there coincide only the alphabetic transformations (opera- 
tors) defined by them, but, generally speaking, not the methods of rep- 
resentation, we shall term equivalent algorithms. 

Usually in the abstract theory of algorithms we consider only 
those algorithms to which there correspond single-valued alphabetic 
operators. Every algorithm A of this kind differs in that to any input 
word p from the domain of its definition it associates a completely de- 
fined output word q = A(p) regardless of the conditions in which the 
algorithm A operates. Such algorithms and their corresponding alphabet- 
ic operators will be called determinate. 

In many cases it is advisable to expand the concept of the algo- 
rithm, introducing into the system of rules which describe the algo- 
rithms the possibility of the random selection of particular words or 
particular rules. Here the probability of a particular selection must 
be either fixed in advance or determined in the process of realization 
of the algorithm. Such algorithms will be called random and will lead 
to the multi-valued alphabetic operators. More precisely, for any in- 
put word p appearing in the domain of definition of the random algo- 
rithm A, this algorithm uniquely defines the probability a,(a) of the 
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appearance of the different output words g as the response to the in- 
put word p. The probabilities aa) in the case of the usual random al- 
gorithm must not vary in the process of its functioning, although the 
algorithm itself can, of course, give different responses with repvat- 
ed application to the same input word p. 

We need to consider also the so-called self-variable algorithms, 
i.e., those algorithms which not only transform the input words ap- 
plied to them but also themselves change in the process of this trans~ 
formation. The result of the action of the self-variable algorithm A 
on a particular input word p depends not only on this word but also on 
the history of the preceding operation of the algorithm, 1i.e., on the 
(finite) sequence of input words processed by the algorithm A prior to 
the arrival at its input of the word p in question. 

The generalization of the concept of the algorithm by means of 
the introduction of the possibility of self-variation is applicable to 
both the determinate and the random automata. In the latter case, de- 
pending on the history of the previous operation of the algorithm, 
there are changes of the probabilities (a) of the different output 
words g associated by the algorithm A to any given input word p. This 
dependence can, moreover, also be expressed by a random function rath- 
er than a determinate one. 

The self-variable algorithms are conveniently represented in the 
form of a system of two algorithms, the first of which, the so-called 
operational algorithm, performs the processing of the input words, and 
the second, termed the monitoring or controlling algorithm, introduces 
specific changes into the first, operational, algorithm. In Chapter 4 
it is shown that the property of self-variability of the algorithm is 
determined not so much by the structure of the device which realizes 
the corresponding algorithm, as by the method of fractionation of the 
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input information into individual words, which, as noted above, in the 
case of the abstract alphabets is to a considerable degree arbitrary. 
Thus, depending on the choice of this method the same device may in 
some cases realize a self-variable algorithm, in other cases it will 
realize a non-self-variable algorithm. 

Throughout the first three chapters we shall consider only the 
conventional (determinate, non-self-variable) algorithms without mak- 
ing this stipulation in every instance. In the later chapters use will 
be made also of the generalized concepts of the algorithms introduced 
above. 

§2. NORMAL ALGORITHMS 

In this and the several following sections we shall study certain 
general methods of representation of the algorithms which are charac- 
terized by the property of universality, i.e., those methods which 
make it possible to obtain an algorithm which is equivalent to any pre- 
scribed algorithm. In this chapter rarious universal methods or repre- 
senting algorithms are discussed, not in the historical sequency in 
which they were developed, but in an order which is most convenient 
from the point of view of the present volume. We begin our exposition 
with the so-called normal algorithms suggested and studied by Markov 
(Ref 53). 

Every general method of representation of algorithms is termed an 
algorithmic system. The algorithmic system usually includes objects of 
a dual nature which, following Kaluzhnin (Ref 37), we shall term opera- 
tors (or, more precisely, elementary operators) and identifiers (more 
percisely, elementary identifiers). Elementary operators are quite sim- 
ple (simply represented) alphabetic operators whose sequential perform- 
ance realizes any algorithms in the algorithmic system in question. 

The identifiers serve for the recognition of particular properties of 
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the information processed by the algorithm and for the variation, de- 
pending on the results cf the identification, of the sequence in which 
the elementary operations f2llow one another. 

For indicating the set of elementary operators and the order of - 
their sequencing one after the other in the representation of any spe- 
cific algorithm, it is convenient to make use of the directed graphs 
of a special kind which, following Kaluhnnin (Ref 37), we shall term 
the graph-diagrams of the corresponding algorithms. 

The graph-diagram of an algorithm is a finite set of circles (or 
other geometrical figures), termed the elements of the graph-diagram, 
which are interconnected by arrows. To each element, other than tne 
two special elements which are termed the input and output, there is 
associated some elementary operator or identifier. From each element 
representing an operator, and also from the input element, there e- 
merges precisely one arrow; from each element representing an identi- 
fier there emerge precisely two arrows; no arrow emerges from the out- 
put element. Any number of arrows can enter an element. 

The algorithm defined by any given graph-dlagram operates as fol- 
lows. The input word enters first the input element and travels in the 
directions indicated by the arrows, being transformed on passage 
through the operator elements by the operators associated with these 
elements. When the word enters an identifying element a check is made 
of the condition associated with this element (application of condi- 
tional identifier). If the condition is satisfied, the word emerges 
from the element along one of the arrows (usually indicated by the sym- 
bol "+"), and if the condition is not satisfied it emerges along the 
other arrow (indicated by the symbol "-"), 

The word is not altered in the identifying elements. If the input 
word p applied to the input element of the graph-diagram, after passing 
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through the elements of the diagram and being transformed, arrives aft- 
er a finite number of steps at the output element, it 1s considered 
that the algorithm is applicable to the word p (the werd p is in the 
domain of definition of this algorithm), and the result of the action 
of the algorithm on the word p will be that word which is in the out- 
put element of the diagram. If after the application of the word p to 
the input element of the graph-diagram its transformation and movement 
along the graph-diagram lasts infinitely long, without arrival at the 
output element, then it is considered that the algorithm is not appli- 
cable to the word p, in other words, the word p is not in the domain 
of definition of the algorithm. 

In normal algorithms use is made only of one type of elementary 
operator, termed substitution operators, and one type of elementary i- 
dentifier, termed occurrence identifier. We shall describe these iden- 
tifiers and operators in more detail. To do this we shall first ac- 
quaint ourselves with the concept of occurrence of one word in another. 

Let p and g be two arbitrary words in a particular alphabet. We 
Say that the word g occurs in the word p if the word p can be repre- 
sented in the form p = P34P55 where P) and Py and some words, possibly 
even empty ones. The occurrence found for the word gq in the word p is 
termed first left (or simply first) occurrence if in the considered 
representation of the word p in the form p = P,P, the word Py has the 
shortest possible length among all similar representations of the word 
Pp: 

The occurrence identifier is given by the indication of some 
fixed word g, and the sense of its application is that for any given 
word p a check is made of the condition of whether or not the word q 
occurs in the word p. The substitution operator is usually given in 
the form of two words connected by an arrow, q, — do. The operation of 
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the operator amounts to performance of the substitution of the word qo 
in place of the first left occurrence of the word q4 in any given word 
p. If we separate explicitly the first occurrence of the word q4 in the 
word p, writing the word p in the form P}44Po> after the application of 


the considered operator it is transformed into the word P145Po: 


In the application of the occurrence identifier we agree to sepa- 
rate the found (first left) occurrence of the identified word in the 
given word by the use of parentheses. For example, applying to the 
word p = xxyxyxx the occurrence identifier of the word q = xy, we sepa- 
rate the first occurrence of the word g in the word p as follows: p = 
x( xy) xyxx. 

The algorithms which are represented by graph-diagrams consisting 
exclusively of word occurrence identifiers and substitution operators 
are termed generalized normal algorithms. Here it is assumed that to 
each substutution operator of the form 7 Ao there is connected only 
a single arrow: an arrow with a "+" sign emerging from the q, identi - 
fier. 

An example of a graph-diagram cf a generalized normal algorithm 
is shown in Fig.1. On this figure the identifiers are shown in the 
form of rectangles. The operator xy — denotes substitution of an empty 
word in place of the first occurrence of the word xy. In accardance 
with the notation of the empty word which was used in the preceding 
section, this operator can be written also in the form xy — e. 

Considering the operation of the algorithm A given by the graph- 
diagram of Fig. 1,we note that the first operator from the top per- 
forms the transposition of x to the left and of y to the right portion 
of the word until the word takes the form xx...xyy...y (all x precede 
all y). Only after reduction of the word to this form does the second 


operator come into action, annihilating the pairs xy until only x or vy 
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remain in the word. If in the originally given word p there were m x's 
and ny's, then as a result of the operation of the algorithm A it is 
transformed into the word q = A(p), having the length |m-n| and con- 
sisting of only x's (if m > n) or only y's (if n> Mm). 

Having considered the generalized normal algorithms, let us turn 
to the characteristic of the normal algorithms themselves. Those gen- 
eralized normal algorithms whose graph-diagrams have some special form 
are termed normal algorithms. In order to describe this form we note 
that as a result of the definition of the generalized normal algo- 
rithms presented above, every operator q, 7 In Occurs paired with the 
identifier q4 in the graph-diagram of such algorithns. 

Let us combine in the graph-diagram each such pair of elements in- 
to a single element, retaining for it the notation of the correspond- 


ing operator. 





Fig. 1. a) Input; Fig. 2. a) Input; 
b) output. b) output. 


From each combined element there will emerge two arrows: an arrow with 

the symbol "+" along which there is directed the word subjected to the 

action of the operator of the given element, and an arrow with the sym- 
bol "—" along which the word is directed if the element operator is not 
applied to it. Nonapplicability of the substitution operator to a word 

denotes the absence of the occurrence of the left portion of the opera- 
tor (the word q in the operator ose qo) in the given word. 
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Using the described technique for combining elements, the graph- 
diagram of the algorithm shown in F.,: 1 can be represented in the dia- 
gram shown in Fig. 2. Such a graph-.sarram with combined elements in 
the case of the normal algorithms must ¢tisfy the following condi- 
tions: 

a) all the combined (operator-identifier) elements of the graph- 
diagram are ordered by means of assigning them the sequential numbers 
from 1 to n, and a negative output (arrow with symbol "—") of the i-th 
element is connected to the (i + 1)-th element (1=1, 2,...,n— 1)and a 
negative output from the n-th element is connected to the output ele- 
ment of the graph-diagram; 

b) the positive outputs (arrows with the symbol "+") of all the 
combined elements are connected either to the first or to the output 
element of the graph-diagram. In the first case the substitution of 
the operator of the corresponding element is termed ordinary, in the 
second case it is termed final. 

ec) the input element is connected by an arrow to the first com- 
bined (identifier-operator) element. 

These conditions are necessary and sufficient for the graph- 
diagram which satisfied them to represent an ordinary normal algorithm 
rather than a generalized normal algorithm. It is easy to verify that 
the graph-diagram shown in Fig. 2 is not a graph-diagram of a normal 
algorithm since it does not satisfy the second of the conditions just 
formulated (condition "b"),. 

The normal algorithms are customarily represented not by graph- 
diagrams but simple by the ordered set of substitutions of all the op- 
erators of the given algorithm, termed the diagram of the given algo- 


rithm. Here the ordinary substitutions are written, as shown above, in 
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the form of two words connected by an arrow (a, Ao) while the final 
substitutions are designated by an arrow with a dot (a, > Io) 

The order of performance of the substitutions is completely deter- 
mined after this by the conditions "a", "b" and "c". Actually, as a re- 
sult of these conditions the arbitrary i-th substitution of the algo- 
rithm diagram must be performed in, and only in, the case when it it 
the first of the applied substitutions (all substitutions from the 
l-st to the (i - 1)-th not applied). The process of performing the sub- 
stitutions is terminated only when none of the substitutions of the 
diagram is applicable to the word obtained or when some final substitu- 
tion is performed (for the first time). 

As an example, let us consider the operation of the normal alro- 


rithm A given by the diagram 


YYX > Yr 
XX>Y, 


yyy > - %. 

Let us assume that we are given the input word p = xyxxxyy. The 
first substitution of the algorithm A is not applicable to this word, 
in order to apply the second substitution we isolate the first occur- 
rence of its left part (xx) in the word p:p = xy(xx)xyy. After perform- 
ance of the second substitution of the algorithm, we obtain the word 
Py = XvYXYY; to which the first substitution of the algorithm is appli- 
cable: p, = x(yyx)yy — xyyy = Po. Only the third substitution is appli- 
cable to the resulting word: py, = x(yyy) + xx = P3, and since it is 
denoted as a final substitution, the word P3 is the final result of 
the action of the algorithm A on the original word p, l.e., P3 = A(p). 

If the third substitution of the algorithm A were not a final svo-~ 
stitution, then the process of substitution could be continued aid in 
place of the word P3 = xx we would obtain the word Py = y as the re- 
sult of the action of the algorithm on the original word p. 
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The use of final substitutions in the normal algorithm d!agrams 
along with the ordinary substitutions is necessary in order to have 
the possibility of realizing in such diagrams the arbitrary construc- 
tive alphabetic operators, 1.e., those alphabetic operators which are 
determined with the use of a finite number of rules. Actually, any nor- 
mal algorithm A whose diagram does not contain a single final operator 
can terminate its operation only when none of its substitutions is fur- 
ther applicable. This implies directly that repeated application of al- 
gorithm A to the word A(p) obtained as a result of the application to 
any input word p cannot change this word. In other words, the follow- 
ing identity relation (valid for any input word p) is satisfied for 
the algorithm A (see Markov [53]): 

A(A(p)) = A(p). (5) 

By no means every constructive alphabetic operator satisfies this 
relation. An example of an alphabetic operator for which relation (5) 
is not valid is the operator B, whose action on any word p amounts to 
prefixing some fixed letter x to the left of this word: B(p) = xp. 

From what we have said above it is clear that this operator cannot be 
realized by the use of a normal algorithm whose diagram does not con- 
tain final substitutiors. 

At the same time it is easy to verify that this operator is real- 
ized by the normal diagram consisting of the single final substitution 
— °x (or, what is the same, e — *x). Actually, as a result of the defi- 
nition of occurrence taken above, an empty word occurs in every word p, 
and its first occurrence will not have a single letter on its left. It 
follows directly from this that the use of this substitution on the ar- 
bitrary word p converts it to the word xp. 

It is no less evident that in the construction of the theory of 
normal algorithms we cannot limit ourselves to only final substitutions. 
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Actually, the normal algorithm whose diagram consists only of final 
substitutions operates on each input word } with no more than one of 
these substitutions, after which the required output word A(p) is ob- 
tained immediately. In view of the finiteness of the algorithm diagram, 
the moduli of the differences of the lengths of the words p and A(p) 
are bounded in the aggregate (for any selection of the input word p) 
by the same number N (the maximum of the moduli of the differences of 
the lengths of the words in the left and right sides of the substitu- 
tions of algorithm A). 

There do exist, however, simple constructive algorithms for which 
the moduli of the differences of the lengths of the input and corre- 
sponding output words are not bounded in the aggregate. An example of 
such operators might be the operator D for the doubling of the input 
words, whose action on any input word p is determined by the equality 
D(p) = pp. From what we have said above, it is clear that the repre- 
sentation of this operator in the form of a normal algorithm whose dia- 
gram contains only final substitutions is obviously impossible. 

Thus, if we present to an algorithmic system based on the use of 
normal algorithms the requirement of universality (possibility of con- 
structing a normal algorithm which 1s equivalent to any a priori speci- 
fied algorithm), then a necessary condition for such universality is 
the use of both forms of substitutions, both final and ordinary. This 
condition is also sufficient, i.e., we can formulate the normalization 
principle (see Ref. 53). 

Normalization principle. For any algorithm (constructively given 
alphabetic representation) in the arbitrary finite alphabet A we can 
construct an equivalent normal algorithm on the alphabet A. 

The concept of a normal algorithm on an alphabet which is used on 
the formulation of the normalization principle means the following. In 
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many cases it is not possible to construct a normal algorithm equiva- 
lent to a given algorithm (in the alphabet A) if we use only letters 
of the alphabet A in the substitutions of the algorithm. However, we 
can construct the required normal algorithm by adding to the alphabet 
A some number of new letters or, as we usually say, performing an ex- 
pansion of the alphabet A. In this case it 1s customary to say that 
the constructed (normal) algorithm is an algorithm on the alphabet A. 
We agree, however, that in spite of the expansion of the alphabet the 
algorithm will as before be applied only to words in the original al- 
phabet A. 

As shown by Markov [53] and Nagornyy [58], if we can construct 
the normal algorithm equivalent to a given algorithm in the alphabet A 
by joining to the alphabet A some (possibly very large) finite number 
of letters, then we can construct its equivalent normal algorithm by 
adjoining to the alphabet A only a single additional letter. 

It is not possible to give a rigorous mathematical proof of the 
normalization principle, since the concept of the arbitrary algorithm 
is not a rigorously defined mathematical concept. Therefore, we must 
approach its substantiation just as we approach the substantiation of 
every law or principle of natural science. The substantiation which we 
can give the normalization principle in this framework makes it possi- 
ble to consider this principle credible to a very high degree. We 
shall indicate the basic processes of this substantiation. In order to 
simplify the formulations, we shall agree, following Markov [53], to 
term a particular algorithm normalizable if we can construct its equiv- 
alent normal algorithm (using, possibly, expansion of the alphabet) 
and term it unnormalizable otherwise. We can now state the normaliza- 
tion principle in a somewhat altered form. 


All _ algorithms are normalizable. 
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The validity of this principle is based first of all on the fact 
that all the algorithms known at the present time are normalizable. 
Since in the course of the long history of the development of the ex- 
act sciences a considerable number of different algorithms have been 
devised, this statement is convincing in itself. 

In actuality it is even more convincing. We can show that all the 
methods known at the present time for the composition of algorithms 
which make it possible to construct new algorithms from the already 
known ones do not go beyond the limits of the class of normalizable 
algorithms. In other words, if the original algorithms were normaliza- 
ble, then any compositions of these algorithms (among the number of 
forms of compositions known at the present time) will also be normal- 
izable. This implies that for the construction of an example of an un- 
normalizable algorithms it is necessary to use techniques which are 
qualitatively different from everything the mathematician has encoun- 
tered up till now. 

However this is not all. A whole series of scientists have under- 
taken special attempts to construct algorithms of a more general form 
and all these attempts have not been carried beyond the limits of the 
class of normalizable algorithms. We shall consider one of these at- 
tempts (the algorithmic scheme of Kolmogorov-Uspenskiy) below. The 
failure of these attempts is in itself the most striking evidence in 
favor of the validity of the normalization principle. 

Thus the normalization principle should be considered sufficient- 
ly substantiated, although this substantiation does not exclude com- 
pletely the possibility of its refutation in the future (by construc- 
tion of an example of an unnormalizable algorithm). In any case, the 
normalizable algorithms encompass a significant portion of the algo- 


rithms (if not all) and therefore the system of normal algorithms can 


2 35 = 


ine 


-_ 


ae 


be considered in practice to be a universal algorithmic system. 

Let us consider now some of the common forms of compositions of 
algorithms which were mentioned above. We shall define not the composi- 
tion of the algorithms themselves, but the composition of their corre- 
sponding alphabetical representations, however, as remarked above, the 
possibility of normalization of the result of the composition of the 
normal algorithms makes it possible (at least in the class of normal 
algorithms) to extend the definition of the composition of the repre- 
sentations to the composition of the algorithms themselves. 

One of the most common forms of composition of algorithms (repre- 
sentations) is the superposition of algorithms. In the superposition 
of the two algorithms A and B the output word of the first algorithm 
(A) 1s considered as the input word of the second algorithm (B), so 
that the result of the superposition of the algorithms A and B can be 
represented in the form D(p) = B(A(p)). This definition extends to the 
superposition of any finite number of algorithms. 

A superposition of generalized normal algorithms can be consid- 
ered an a generalized normal algoritnoms. For this it is sufficient 
that the output element of the graph-diagram of each preceding algo- 
rithm be combined with the input element of the succeeding algorithm. 
The normalization of a superposition of normal algorithms requires con- 
siderable skill, however it too can always be accomplished [53]. 

We shall point out some other forms of compositions of algorithms. 

The union of the algorithms A and B in the same alphabet X is the 
term given to the algorithm C in the same alphabet which transforms 
any input word p contained in the intersection of the domains of defi- 
nition of the algorithms A and B into the words A(p) and B(p) written 
side by side; this algorithm is considered undefined on all the remain- 


ing input words. 
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A ramification of algorithms is a composition of the three algo- 
rithms A, B and C. Designating the result of this composition by D, we 
shall consider that the domain of definition of the algorithm D coin- 
cides with the intersection of the domains of definition of all three 
algorithms A, B and C, and that for any word P from this intersection 
D(p) = A(p) 1f C(p) = e, and D(p) = B(p) if C(p) # e. 

A repetition (iteration) is the composition of the two algorithms 
A and B. Designating the result of this composition by P, we define 
that for any input word g the corresponding output word P(q) is deter- 
mined by the following condition: there exists such a series of words 
Q = os Gs Ins eee GF P(q); that for all 1 «1, 2,623; nq, = 
= A(a,_4), for all ie 1, 2.45, n— 1 B(a, ) # e, and B(q_,) =e. In 
other words, the algorithm A is applied sequentially several times 
until a word is obtained which is transformed by the algorithm B into 
the empty word e (we can, of course, select any other fixed ford rath- 
er than the empty word). 

All the methods described for the composition of the normal algo- 
rithms lead to normalizable algorithms [53]. 

Of very great importance for the normal algorithms, just as for 
every universal algorithmic system, is the problem of the construction 
of the so-called universal algorithm. Let us consider the universal 
algorithm in application to the normal algorithms. 

Let us be required to construct a normal algorithm which will per- 
form the operation of any normal algorithm if we are given the diagram 
(substitution set) of this latter algorithm. 

The exact formulation of the problem on the universal algorithm 
can be accomplished by various methods. We shall describe one of the 
most natural methods for such a formulation. To do this we first of 
all fix some standard alphabet £ (for example, binary). For all other 
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possible alphabets we fix some definite method of coding the letters 
of these alphabets in the selected standard alphabet. In the case of 
the binary standard alphabet this can be done, for example, as follows: 
the letters of any given alphabet are numbered sequentially using the 
natural numbers, after which the i-th letter is assigned the binary 
code, beginning and ending with zero and having between these zeros ex- 
actly i ones. If the total number of letters in the given alphabet is 
equal to n, then we introduce also the additional ((n + 1)-st, (n+ 2)- 
nd, etc.) letters for the designations of the symbols used in the dia- 
grams of the normal algorithms (arrows, dots, separation sign between 
formulas) and also for the designation of the special end sign which 
stands at the beginning and end of the algorithm diagram. 

After writing the algorithm diagram with a single word and coding 
the letters of this word by the method just described, we obtain a 
word in the standard alphabet, which is termed the transform of the 
given algorithm. For example, for the normal algorithm given by the di- 
agram 

xy 
yrs, 

the transform AX of the algorithm A in the binary alphabet can be ob- 
tained as follows: we fix the numberation of the letters, considering 
x to be the first, y the second, the arrow to be the third, the dot to 
be the fourth, the separation symbol to be the fifth, and the end sym- 
bol to be the sixth letter. Then the transform AM of the algorithm A 
is written as: 060 010 020 030 010 050 020 030 O4O 060. Here, for brev- 
ity, in place of writing out in a row any positive number n of ones we 
have written this number n itself. 

Along with the transform of the algorithm A, there can also be ob- 
tained by use of the coding in the standard alphabet x described above 
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the transfcoum p" of any input word p of this algorithm. 

The following theorem on the universal normal algorithm is valid 
(see Markov [53]). 

There exists such a normal algorithm U, termed a universal normal 
algorithm, which for any normal algorithm A and any input word p from 
the domain of definition of this latter algorithm transforms the word 
AM oo: obtained by suffixing the transform of the word p to the trans- 
form of the algorithm A, into the word which is the transform of the 
corresponding output word A(p) into which the algorithm A transforms 
the word p. If, however, the word p is chosen so that the algorithm A 
is not applicable to it, then the universal algorithm U is not appli- 
cable to the word AYy!, 

This theorem is of tremendous value, since it implies the possi- 
bility of the construction of a machine which can perform the opera- 
tion of any normal algorithm, which means, in view of the normaliza- 
tion principle, the operation of any arbitrary algorithm. For this 
purpose it is sufficient to insert into the machine a prcgram, i.c., 
the transform of that normal (normalized) algorithm whose operation 
the machine is to perform. 

However, although in principle the possibility has been proved of 
the normalization for all the algorithms known at the present time, 
the actual performance of the normalization 1s a very serious matter 
even for the relatively simple algorithms (the algorittim for the multi- 
plication of two whole numbers, for example). This means that the pro- 
gramming for a machine simulating the universal normal algorithm would 
be excessively unwieldy and impractical. Therefore, in practice the 
machines which make possible the realization of the operation of any 
algorithm are designed on the basis of the use of other algoritimic 
systems which differ from the system of the normal algorithms. Thes: 
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systems are described in Chapter 5. 
§3. THE KOLMOGOROV-USPENSKIY ALGORITHMIC DIAGRAM 

The present section describes the method suggested by Kolmogorov 
and Uspenski [43] for the determination of algorithms of the most gen- 
eral form. For the construction of the corresponding algorithmic dia- 
gram they choose the method which is based only on those properties 
which are without question inherent to any algorithmic diagram and 
which will realize these properties in particular specific forms with- 
out permitting any loss of generality in doing so. 

Ii the construction of such a gencralized algorithmic diagram it 
4s useful to picture as a visualizable model a man who is performing 
the computation or other processing of information in accordance with 
a particular precisely prescribed system of rules. The man performs 
che role of information converter, while the converted information it- 
selr ..3 located outside of the man. We shall assume for definiteness 
that this information is written on sheets of paper, and that the man 
has at his disposal an unlimited supply of clean sheets and an unlim- 
ited reserve of space for storage of filled-out sheets. The transforma- 
tion of the information realized by the man is broken down into indi- 
vidual discrete steps. At each such step the man surveys some number 
of completed sheets and, depending on the contents of these records, 
using a strictly defined and time-invariant system of rules located in 
his memory, he performs certain alterations in the reviewed informa- 
tion. These alterations may be of three forms: erasure (annihilation) 
of the entire reviewed information or some portion of it, recording 
on the reviewed sheets of new information, alteration of the ensemble 
of reviewed sheets. 

At first glance it seems that the requirement for the invariance 
in the system of rules used for the performance of the processing of 
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the tuformation significantly narrows the range of problems considered 
in comparison with the problems which can in actuality be solved by 
man, since man is capable of altering the rules in the course of the 
operation. In actuality this limitation is not significant, since the 
nature of the alteration of the infermation at each step of the proc- 
essing depends not only on the rules of the transformation but also on 
this information itself. In this connection it is possible in case of 
necessity to vary the nature of the information transformation wicn 
the course of time, to introduce corresponding changes in the informa- 
tion itself, and not in the rules stored in the memory of the proces- 
sor, in other words, to write down in the rules on the sheets of paper 
the required alterations and not to memorize them. 

An absolutely necessary limitation in the design of any alforith- 
mic system is the capability of the information processor to absorb at 
any given instant of time only a limited quantity of information. If 
the total volume of the material being processed exceeds the volume of 
this active zone of the processor, then the information must be 
brought into the processing gradually, step by step. 

After these preliminary remarks we turn directly to the descrip- 
tion of the Kolmogorov-Jspenskiy diagram. The information in this dia- 
gram, as in general in the case of the alphabetic conversions, is writ- 
ten with the aid of a ftnite number of symbols, letters, which we 
shall designate as To» Ty sees Th To achieve the greatest possible 
generality, we shall also establish certain relations between the sym- 
bols, these relations belonging to one of the types Ry, Ro» Cg Rw 
For each type of relation R, we fix the number k, of related symbols 
(letters). We designate by K the maximal number among the numbers ky» 
soey Ka The relations between the symbols are introduced in order to 
take account of the case of complex letters which designate, for 
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example, entire pharases in ordinary language. In that case the compo- 

sition of the phrase (letter) may include indications of the relative 
positioning of information (other letters) which has direct relation 

with the letter in question (say, information which must be brought in- : 
to consideration in the following step of the algorithm). The limiting 

of the number of related symbols depends on the boundedness of the in- 
formation contained in each letter (otherwise the letter cannot be con- 

tained entirely in the active zone and it must be divided into individ- 

ual portions). 

Let us assume that all the relations 
in which any given letter can occur are or- 
dered in some way and numbered, and that 
the total number of such relations is 
bounded by the same number s. We shall use 


circles to designate the letters, intro- 





ducing when necessary numeration of these 


Figs. 3% 


circles with numerals written adjacent to 

the corresponding circles. These numerals have nu relations to the 
type of symbol (letter) designated by the given circle. When necessary, 
the symbol of the corresponding letter 1s written inside the circle 
which represents it. 

Any relationship between sumbols (letters) can now be represented 
as shown in Fig. 3. 

The subscripts of Pys Pos +++ Py on this figure show the posi- 
tion occupied by the relation in question in the ordered set of rela- 
tions for the corresponding (designated by the numbered circles) let-= 


ter. These subscripts (regardless of the choice of the letters and 


the form of the relation R) can take only the values 1, 2, CP ee 


- 42 . 


iar, i 


We can considerably simplify the writing of the information in 
this diagram by adding to the number of letters To» T)> ares ais s+K 
+m letters: s letters for the designation of the numbers of relations 
in any given element (squares in Fig. 3), K letters for the designa- 
tion of the numbers of relations with the letters for any given rela- 
tion R (triangles in Fig. 3), and m letters for the designation of the 
Rj> Ro» cea Ra relations themselves. If we denote all the new letters 
by circles, then the information takes the form of a set of circles 
connected between one another by paired bonds. Then tiiere is no re- 
quirement for any special numeration for the order of occurrence of a 
letter in particular relations, since, as shown in Fig. 3, all the let- 
ters related with any single letter will inevitably be different. 
Thereby the relations in which a given symbol (letter) occurs are num- 
bered automatically--by the numbers of the sumbols (letters) with 
which the given symbol is related. 

Thus, finally, the information in the written algorithmic diagram 
is represented by an arbitrary finite set M whose elements are the 
fixed letters Tp, Tj, ..+, Ty (N > 1) where in the set M each of the 
letters Tos T3, ney Ty can occur any number of times, and, in addi- 
tion, in the set there occurs each time one and only one of the let- 
ters Ty or Tj. On this set there 1s established a paired relation (cer- 
tain letters "join" pairwise with one another) so that the following 
condition "a" is satisfied: all the letters connected with any single 
letter of the set M are pairwise different. 

In other words, the information is in the form of some one-dimen- 
sional complex (linear undirected graph), whose vertices (designated 
by the circles) are identified with the letters To» T)> bia errg Ty and 
the (undirected) lines connecting certain pairs of vertices are identi- 


fied with the paired relations between the letters described above. 
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The requirement for the occurrence in the complex in question 
(the set M) of one and only one vertex identified either with the let- 
is associated with the necessity for the 


ter T. or with the letter T 


0 2 
establishment of the reference point (center of the active zone) of 
the information, and one of these letters (we assume that it is the 
letter To) is required for the compleses designating the information 
whose processing is not yet completed, and the other (in the present 
case the letter T,) is required for the complexes designating the ter- 
minal information from which the final results of the operation of the 
algorithm must be extracted. 

The vertex of the informational complex S, which is identified 
with the letter To or Ti; is termed the initial vertex of the complex. 
The active zone of the complex S is the subcomplex of the complex S 
which consists of the vertices (letters) and the lines (relations) be- 
longing to the chains of length 2X < P containing the initial vertex, 
‘where P is a number which is determinate, fixed for the given algo- 
rithm. Here and hereafter we use the term chain to designate any fi- 
nite sequence of vertices Bi; By; en by B, such that any two neighbor- 
ing vertices in this sequence are connected by lines; the number of 
all these vertices (equal to p—1) is termed the length of the chain, 
and these lines themselves are also included in the chain in question. 


The ensemble of all the vervices of the active zone of the infor- 
mation complex which are connected with the initial vertex by chains 
of length P and are not connected with it by chains of lesser length 
is termed the boundary of this zone. The complex is called bound if an- 
y two of its vertices can be connected by a chain. The ensemble of ver- 
tices and lines lying beyond the limits of the active zone of the com- 


plex S is termed the external portion of the complex. 
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Two complexes are termed mutually isomorphic if between their ver- 
tices we can establish mutually single-valued correspondence, where 
the corresponding vertices are designated by the identical letters To» 
Ty, eet Tye and corresponding pairs of vertices are either simultane- 
ously connected or simultaneously not connected to one another. Mutual- 
ly isomorphic complexes are in essence identical, and differ perhaps 
only in the method of their representation (position of the vertices 
on the plane, for example). 

In view of the boundedness of the total number of vertices in the 
active zone of the information complex of any given algorithm and the 
boundedness of the number of letters To» T)> ae Ty for any viven al- 
gorithm A, there exists only a finite number of different (pairwise 
nonisomorphic) active zones Ui, Us; Cees Un Starting from this, the 
rules for their processing can be given by the simple correspondence 


table U, — W, (a. tek ee Bie ey Ts 


1 

The complexes appearing in the right side of this table must have 
subcomplexes which are isomorphic to the boundaries of the correspond- 
ing active zones U;> and these isomorphisms aoe be fixed once and for 
all. In other words, to each vertex lying on the boundary L(U, ) of the 
active zone U, there must be associated a completely determined vertex 
of the complex W, (1 = 1, 2, ..., r). Each of the complexes W, must 
satisfy all the conditions imposed above on the information complexes; 
in particular, it must have one and only one initial vertex, designat- 
ed by the letter To or by the letter Ty. 

With the aid of the constructed correspondence tabie, we deter- 
mine the operator Ry which performs the direct processing of the infor- 
mation complex at each step of the operation of the given algorithm A. 
In the considered information cemplex S (initial and intermediate), we 
find the initial vertex. Drawing from it all possible chains of length 
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P, we construct the active zone and determine its boundary L(U). 

Further, we find that (single) active zone from the left side of 
the correspondence table which is isomorphic to the found active zone 
U. As a result of the properties defined above of the information com- 
plexes (in particular the property "a") and the connectedness of the 
two complexes U, and U with one another, only one isomorphism is possi- 
ble. This makes possible unique identification of the vertices lying 
on the boundary L(U) of the active zone U with the corresponding ver- 
tices, for the isomorphic case, lying on the Pounaaty L(U, ) of the ac- 
tive zone U, and, using the identification of the vertices employed in 
the correspondence table, also with certain vertices of the complex U,- 

Now it is easy to remove all the interior, i.e., not lying on the 
boundary L(U), portion of the active zone U and replace it by the sub- 
complex W's of the complex Wy which includes all the elements of this 
complex except its vertices which were identified earlier. Thus, we 
"insert™ into the information complex in question the new complex wi, 
in place of the internal portion of its active zone while retaining un- 
changed the boundaries of the active zone. 

Since in the complex Wi, the initial vertex occupies a new posi- 
tion with relation to the boundary of the previous active zone, the 
new active zone, determined after the insertion, will have a different 
boundary. The new information complex S' obtained after such an inser- 
tion then will be the result of the application of the direct proces- 
sing operator Ry of the algorithm A in question to the original infor- 
mation complex S. The direct processing operator is applied to the 
resulting information complex until obtaining a complex whose initial 
vertex is designated by the letter T) and not by the letter To: 

Such a complex is termed a terminal complex and its maximal bound 


subcomplex, containing the initial vertex Ti; is considered to be the 
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solution, i.e., the information complex obtained as the result of the 
action of the algorithm A on the initial (input) information complex 
So: If, however, the algorithm continues operation without end without 
obtaining a terminal complex at any step, then, just as in the case of 
the normal algorithms, we take it that the algorithm in question is 
not applicable to the given initial complex So: 

We can expand the definition of the algorithm so as to permit in 
the right side of the table correspondences of a complex without an in- 
itial vertex. The application of the substitution with such a right 
side leads to natural termination of the algorithmic process, since 
the determination of the active zone and the further substitution be- 
come impossible. 

However, since the terminal complex (in the sense defined above) 
does not appear, again in this case the algorithmic process must be 
considered to have terminated without result and the algorithm is con- 
sidered inapplicable to the corresponding initial information complex. 

Still another type of unsuccessful termination of the algorithmic 
process is possible in which the correspondence table does not contain 
all forms of active zones which are possible for nena) algorithm. 
In the case when the information complex reaches a state in which al- 
though there is an initial vertex designated by the letter To none of 
the substitutions of the correspondence table are applicable, it is al- 
so considered that the algorithm is not applicable to the correspond- 
ing initial information complex. 

We must make still one more remark on the nature of the substitu- 
tions in the correspondence table. If special measures are not taken, 
as a result of the substitutions the condition "a" introduced above 
may be violated; this condition must be satisfied by all the informa- 


tion complexes we are considering. In order to avoid such a distortion 
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of the information, it is clearly sufficient to assume that any vertex 
of the arbitrary complex Wy from the right side of the correspondence 
table, which in the process of "insertion" is identified with some ver- 
tex q of the boundary of the active zone in the complex W,> can be con- 
nected by lines only with the initial vertex and with the vertices des- 
ignated by the same letters as the vertices with which there is con- 
nected by lines in the complex U, the vertex corresponding to the ver- 
tex gq. 

This condition (we term it "b") does not violate the generality 
of our considerations. The boundary used in performing the "insertion" 
operation is defined quite arbitrarily. If we included in the boundary 
not only those vertices which are removed from the initial vertex by 
the distance P (connected with it by chains of length P but not by 
chains of lesser length) but also the vertices which are removed from 
it by the distance P — 1, then, establishing the isomorphism of the 
boundaries in the compleses U, and Wy we would obtain, as it is not 
difficult to see, a stronger limitation on the correspondence table 
than the limitation imposed by the condition "pb". 

Careful analysis of the description of the Kolmogorov-Uspenskiy 
algorithmic diagram shows that in form this diagram to a very signifi- 
cant degree is reminiscent of the operation actually performed by a 
man when he processes information supplied to him externally in accord- 
ance with the particular rules of an algorithm which he has memorized. 
The developers of this diagram took special measures not to lose gener- 
ality in the nature of the transformation performed. Nevertheless, 
they demonstrated that the diagram which they described gives the pos- 
sibility of constructing only normalizable algorithms. This result can 
be considered confirmation of the normalization principle formulated 


in §2. 
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§4. oT YEORETICAL ALGORITHMIC SYSTEMS 

\.. .vordeally the first algorithmic system which received fairly 
complete and thorough development was the system based on the use of 
constructively determinate arithmetic (integral) functions which were 
given the name recursive functions. The use of these functions in the 
theory of algorithms is based on the idea of numeration of the words 
in any alphabet by means of the sequential natural numbers. This numer- 
ation can be accomplished most simply by arranging the words in in- 
creasing order of their lengths, and arranging words having the same 
length in an Eee (lexicographic, for example) order. 

After numeration of the input and output words in an arbitrary al- 
phabetic operator, this Ane aed is transformed into the operator y = 
= f(x) in which both the argument x and the function y itself take non- 
negative integral values. The function f(x), of course, can not be de- 
fined for all values of the argument x but only for certain values of 
x which constitute the domain of definition of this function. Such par- 
tially defined integral and shole-valued functions are usually termed 
arithmetic functions for brevity. 

Among the arithmetic functions we separate the following particu- 
larly simple functions which we shall term elementary arithmetic func- 
tions: the function identically equal to zero (defined for all whole 
nonnegative values of the arguments); the identity functions f(x, ) = 
=X, which repeat the values of their arguments; the direct succes- 
sion function f(x) = x + 1, which also defined for all whole nonnega- 
tive values of its argument. 

Using as original functions the elementary arithmetic functions 
just listed, we can with the aid of a smali number of general construc- 
tive techniques construct ever more and more complex arithmetic func- 


tions. In the theory of recursive (constructive arithmetic) functions 
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three operations are of particularly great importance: superposition, 
primitive recursion and least root operations. 

The operation of superposition of functions involves the substitu- 
tion of some arithmetic functions in place of the arguments of other 
arithmetic functions. Thus, from the already known neneevenE ve can 
construct new arithmetic functions. For example, performing the super- 
position of the functions f(x) = O and g(x) = x + 1, we arrive at the 
function h(x) = 1. With the superposition of the function g(x) with it- 
self there appears the function p(x) = x + 2, etc. 

The operation of primitive recursion makes it possible to con- 
struct an n-place arithmetic function (function of n arguments) from 
two given functions, one of which is (n — 1)-place, and the other is 
(n + 1)-place. the method of this construction is determined by the 
following two relations: 

PE soe yg O) = (Ka oo Zyyi (6) 
F (xa Xe. oo X qi Bq HF 1) HA (Ky Xa, Ze Ys (7) 
where y = f(x,, veey Xa x): 2 is the function being determined 
and g and h are the given functions. 

For a proper understanding of the operation of primitive recur- 
sion we must note that every function of a smaller number of variables 
can be considered as a function of any larger number of variables. In 
particular, constant functions, which it is natural to consider as 
functions of a zero argument, can if desired by considered as func- 
tions of any finite number of arguments. 

As an example, let us consider how the operation of primitive re- 
cursion is applied to construct from the elementary arithmetic func- 
tions the two-place summation function f(x,y) = x + y. This function 
is determined with the aid of the identity function g(x) = x and the 


direct succession function h(x) = x +1 
a 156 = 


f(x, 0) = x = g(x); 
F.yt+ Y= ty +1 =A (x y)). 


We can construct similarly the product, exponential, power and 
other widely known arithmetic functions. 

The functions which can be constructed from the elementary arith- 
metic functions using the operations of superposition and primitive re- 
cursion any (finite) number of times in any sequence are termed the 
primitively recursive functions. 

The majority of the arithmetic functions are primitively recur- 
sive functions. Nevertheless the primitively recursive functions do 
not include all the arithmetic functions which can be defined construc- 
tively. In the construction of all these functions use is made of oth- 
er operations, in particular the least root operation. 

The least root operation makes it possible to determine a new a- 
rithmetic function f(x), ..., Xd of n variables with the aid of the 
previously constructed arithmetic function g(x, seea Kis y) of n+ 1 
variables. For any given set of values of the variables x) = O15 Pe ae 
xX, = @, as the corresponding value f(a,, Any sees a.) of the function 
being determined f(x,, Koa sees x) we take the least integral nonncga- 
tive root y = B of the equation g(a,, veey Os y) = 0. In the case of 
nonexistence of integral nonnegative roots of this equation, the func- 
tion f(x,; Kos veers x) is considered indeterminate for the correspond- 
ing set of values of the variables. Usually it is also presumed that 
the function f(x,, Xos sees x) is indeterminate on the set x, = 4), 

Xp = Aps sees X = Ans if with the existence of the least root y = 6 
of the equation g(o,,5 Ans sees Oy y) = O for at least one integral 


nonnegative value of y = y which satisfies the relation O< y< B-1, 


the function g(a,, Ans sees Gps y) is indeterminate. 
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The arithmetic functions which can be constructed from the elemen- 
tary arithmetic functions with the aid of the operations of superposi- 
tion, primitive recursion and least root are termed partial recursive 


functions. If these functions are in addition everywhere determinate, 


then they are termed general recursive functions. 
In this definition, just as in the definition of the primitive re- 





cursive functions, provision is made for the possibility of performing 
all admissible operations in any sequence and any finite number of 

times. fhere exists, however, the result of Kleene [41] whic: makes it 
possible to obtain any partially recursive function from two primitive 


recursive functions with the use of sequential application to them of 


| 
| 
| 
: 


a Single least root operation and a single superposition operation. 
This result can be formulated more exactly as: 

; for any partial recursive function f(x,, aw x) there exist two 
primitive recursive functions g(x), tery x) and h(x) such that the 
function f(x,; ae x) can be obtained from them in the form f(x; 

Ki Ky x) =h (uy lex, 5 veey Xp y) = 0]), where Hy is the least root 
operator. Here the function h(x) can be chosen once and for all, re- 
gardless of the choice of f. 

The partial recursive functions are the most common class of con- 
structively definable arithmetic functions. They include, in particu- 
lar, all the arithmetic functions which can be given in the form of 
finite recursive schemes of arbitrary form. By finite recursive scheme, 
here we understand any finite system of equalities r = s, where r and 
8 are any finite (containing a finite number of symbols) expressions 
constructed from the known primitive recursive functions of unknown 
functions with numerical and literal arguments, where the values of ¢ 
the unknown functions for any given values of the arguments must be de- 
termined uniquely after a finite number of steps (depending on the 
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selection of the values of the arguments) as a result of the applica- 
tion of two rules. The first rule (substitution rule) consists in the 
substitution into some one of the given equalities in place of one of 
the arguments some one of its numerical values. The second rule (re- 
placement rule) makes it possible to use an equality of the form x = 
= f(x, Koa sees x,)5 where X, X,, Xp» +++, X, are numbers for the re- 
placement by the quantity x of some occurrence of the quantity f(x, 
Kos eee x.) in one of the equalities r = s. 

It is found that all the general recursive functions and only 
such functions can be represented in this manner. This situation makes 
it possible, following Erbran and Godel, to define the general recur- 
sive functions as functions represented by the finite recursive 
schemes of the form described above. 

If, retaining the condition of single-valuedness, we do not re- 
quire the definability of the values of the functions appearing in the 
scheme for all values of the arguments, a can represent the partial 
recursive functions by similar schemes. It is of essence that no recur- 
sive definitions (using finite schemes) make it possible to go beyond 
the limit of the class of partial recursive functions. 

After accomplishing the numeration of the input and output words, 
any normal algorithm can be realized in the form of a partial recur- 
sive function. Conversely, any algorithm which is realizable with the 
aid of the partial recursive function is equivalent to some normal al- 
gorithm. Thus, we can draw the following important conclusion. 

An algorithm is normalizable when and only when it can be real- 
ized with the aid of the pai'tial recursive function. 

This proposition shows that even on the basis of the arithmetic 
(numerical) approach to the theory of algorithms there is no departure 
from the class of the normalizable algorithms. 
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Let us consider two other approaches to the theory of algorithms 
proposed in 1936 by Post [63] and Turing [73]. 

In the algorithmic system proposed by Post, the input and output 
information is represented in the standard binary form, while the al- P 


gorithm is in the form of a finite ordered set of rules termed orders. 
For the writing of the input, output and intermediate information use 
is made of a hypothetical endless information tape which is divided in- 
to individual cells, in each of which there can be located only a sin- 
gle letter (digit 0 or 1). Those cells in which ones are written are 
termed signed and those in which zeros are written are termed unsigned. 
At any instant of operation of the algorithm only a finite number of 
cells can be signed. . 

The operation of the algorithm is accomplished in discrete steps, 
in each of which there is performed one of the orders which constitute 


the algorithm. To each step there corresponds a definite active cell 








on the information tape. Some initial cell is fixed as the active cell 
for the first order. Further changes of the location of the active 
cell on the tape must be provided for in the algorithm itself. The or- 
ders which constitute the algorithm can belong to one of the following 
six types. 

First type. Flag the active cell of the tape (write one in it) 
and go to the performance of the i-th order (2 can be any number from 
the numbers used for the numeration of the orders of the algorithm). 

Second type. Erase the flag of the active cell (write zero in it) 
and go to the performance of the i-th order. 

Third type. Shift the active cell one step to the right and go to 
the performance of the i-th order. 

Fourth type. Shift the active cell one step to the left and go to 
the performance of the i-th order. 


ae 


” 
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Fifth type. If the active cell is signed (one is written there), 
then go to the performance of the j-th order, and if the active cell. 
is not signed (zero written there), then go to the performance of the 
i-th order. 

: Sixth type. Stop, termination of operation of the algorithm. 

Algorithms composed of any finite number of rules of the type des- 

% cribed are called Post algorithms. It has been shown that the Post al- 
gorithms reduce to the algorithms realizable with the aid of the par- 
tial recursive function, and, conversely, any partial recursive func- 
tion can be represented by an algorithm of the Post system. Thus, we 
can formulate the following proposition. 

The class of all algorithms equivalent to the Post algorithms co- 
incides with the class of all normalizable algorithms. 

The algorithmic scheme proposed simultaneously by Post and Turing 
[73] 1s quite close to the scheme just described. In the Turing scheme, 

, which is customarily termed the Turing machine, the information is al- 
so recorded on a bilaterally infinite information tape which is divid- 
ed into individual cells. However, in contrast with the Post algorithm, 
here an arbitrary finite alphabet is required for the writing of the 
information. Each cell of the information tape serves for the writing 
of a single letter. This letter can be surveyed by a sensitive element, 
the so-called head of the Turing machine, which is capable of displace- 
ment along the information tape in both directions. The head of the 
Turing machine can be in a finite number of different states Qj» os 

> sees Ans can print in the surveyed cell any letter Xys Xos +e K and 
can shift to the right or left along the information tape by one cell. 

The writing of the algorithm realized by the Turing machine is ac- 
complished with the aid of the operating program of this machine, 


which is a set of five symbols of the form X51 5% Fp 8p > The 
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written-out group of five symbols designates that he Turing machine 
head which is in the state qa; and senses the letter X, recorded on the 
tape will print in place of this letter the new letter x, (which can 

in a particular case coincide with the previously recorded letter x), 
transfers to the new state qd. (which also can coincide with the previ- 
ous state) and makes a shift along the tape of the magnitude By? equal 

to tl. 

The original scheme of the Turing machine was intended for the 
writing out o. the values taken by an arbitrary single-place partial | 
recursive function with values of the argument equal to 0, l, 2, .... 
In this case, of course, the ine gachine must operate infinitely 
long. We can construct a Turing machine which computes the values of 
any a priori given partial recursive function. It is advisable, how- 
ever, to modify the original scheme of the Turing machine described a- 
bove. Let us assume that the last symbol 8p of the group of five sym- 
bols describing the operation of the Turing machine can take, in addi- 
tion to the values +1 introduced above, a third value--"stop machine". 
With this addition the Turing machine is converted into an ordinary al- 
gorithmic system. It either processes the input word p initially writ- 
ten on the tape infinitely long or after a finite number of transforma- 
tion steps it stops. In the first case it is presumed, as usual, that 
the algorithm realized by the machine is not applicable to the input 
word p. In the second case the information remaining on the tape at 
the instant the machine stops is taken as the output word into which 
the machines transforms the given input word p, In this case, of 
course, it is necessary to have in the alphabet used for the recording 
of the information on the tape a special empty word to designate those 


cells in wnich no information is written. 
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We can show that all algorithms which are realizable with the aid 
of the described modificaticns of the Turing machines are normalizable 
and, conversely, any normalizable algorithm can be realized with the 
aid of a Turing machine specially constructed for this purpose. Making 
use of the sriting of the programs of operation of the Turing machines 
and of their input words in some standard alphabet, we can construct 
a universal Turing machine by exactly the same method used in con- 
structing the universal normal algorithm (§2). Giving the universal 
Turing machine the representation of the program of any given Turing 
machine M and the representation of any input word p, we obtain the 
representation of the output word g into which the machine M trans- 
forms the input word p. If, though, the algorithm realized by the mach- 
ine M is not applicable to the word p (the machine M works infinitely 
long on its transformation), then the algorithm realized by the univer- 
sal Turing machine also is not applicable to the word formed from the 
representation of the word p and the program of the maching M. 

Thus, in spite of the considerable qualitative difference, all 
the described algorithmic systems lead, in essence (with an accuracy 
to equivalency), to the same class of algorithms. This conclusion is 
still another confirmation that the modern theory of algorithms em- 
braces an extremely broad class (if not all) of constructively definab- 
le alphabetic operators. 

§5. THE CONCEPT OF ALGORITHMICALLY INSOLUBLE PROBLEMS 

Every algorithm is the method of solution of some mass problem 
which can be formulated in the form of the processing not of one, but 
an entire set of input words into the corresponding output words. 

Since both the condition and the solution of any problem can be ex- 
pressed in the form of individual words, every algorithm can be consid- 
ered as a universal method for the solution of an entire class of prob- 
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A detailed analysis shows that there also exist those classes of 
problems for whose solution there is not and can not be a single uni- 


versal technique. The problems of the solution of this kind of problem 





are termed algorithmicly insoluble problems. However the algorithmic 
insolubility of the problem of the solution of problems of a particu- 
lar class does: not at all indicate the impossibility of the solution 
of any specific problem of this class. The question concerns the impos- 
sibility of the solution of all problems of the given class by the 
Same technique. 

For a better understanding of the problem of the algorithmic in- 
solubility we shall present examples of algorithmicly soluble and algo- 
rithmicly insoluble problems. 

A typical example of an algorithmicly soluble problem is that of 
the proof of identities in ordinary algebra. For simplicity we shall 
limit ourselves to the cases when the identities are constructed from 
rational numbers and letters (designated variables) with the aid of 
the addition, subtraction and multiplication operations. The following 
general technique for the solution of this problem is well dnown from 
the school algebra course: using the distributive way for multiplica- 
tion, we remove the parentheses in the right and left sides of any giv- 
en identity and perform the reduction of like terms in accordance with 
well known rules. After accomplishment of all these transformations, 
both the left and right sides of the original identity are transformed 
into polynomials. The identity will be valid when any only when these 
polynomials identically coincide with one another. In other words, the 
validity of the identity means that after the transfer of all the 
terms of the transformed identity into one side these terms mutually 
cancel, the result being the conversion of the identity into the trivi- 


al identity 0 =0. 
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Thus, the identity problem in elementary algebra is algorithmicly 
solvable--there exists a single constructive technique which makes it 
possible after a finite number of steps to decide whether any fiven ‘e- 
lation is an identity. We can, however, construct examples of such al- 
gebraic systems in which the identity problem is an algorithmicly in- 
soluble problem. As such algebraic systems we might select, for exam- 
ple, the semigroups or groups given by systems of generating elements 
and defining relations. Examples of semigroups with insoluble identity 
problem were first found by Post [64] and corresponding examples for | 
groups were found by Novikov [60]. a 

Without writing out the defining relations explicitly, we shall 
clarify the essence of these examples. Let Xs Xos eee s X, be letters 
of some finite alphabet. The set of all words in this alphabet, includ- 
ing the empty word e, is termed a free semigroup with the generating 
elements X19 Xps eees Xs if for the arbitrary pairs of words p, q 
there is introduced the multiplication operation amounting simply to 
the suffixing of one word to the other. We agree to designate the free 
semigroup with generating elements Xs Xos core X, by F(x, ; Kos sees 
xs and the result of multiplying the word p by the word g we desig- 
nate by pq. 

In the free semigroup we can introduce any set of defining rela- 
tions, which are formal equalities between two nonidentical words: 

Pp, = a, (4 = 1, 2, ...). Two words in the free semigroup F with the 
given system S of defining relations are termed identical, or mutually 
equivalent, if one of them can be obtained from the other by an arbi- 
trary number of substitutions into the second word of the right sides 
of the defining relations in place of the left and, conversely, the 
left in place of the right. For example, in the semigroup with the sys- 
tem of generators (x,y) and one defining relation xy = yx the words 
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Pp = xxy and q = yxx are mutually identical since the first word can be 
obtained from the second as the result of two substitutions of the 
form described above: q = yxx =» xyx — xxy = p. With the reverse substi- 
tution, the chain of substitutions written above can be read in the re- 
verse direction, which makes possible not only the transformation of 
the word g into the word p but also of the word p into the word gq. 

The identity problem of words for the semigroups is formulated as 
follows. 

Assume that in the arbitrary free semigroup F with a finite num- 
ber of generators there is given any system of defining relations S 
consisting of a finite number of relations. We are required to find 
the single constructive technique which makes it possible after a fi- 
nite number of steps to decide whether any two given words of the semi- 
group F with the system of defining relations S are identical or non- 
identical. 

For some systems of defining relations the problem formulated is 
solvable; however, as Post [64] has shown, there also exist such sys- 
tems of defining relations for which the problem of the identity of 
the words is algorithmicly insoluble. This does not mean, of course, 
impossibility of. establishing the identity or nonidentity of any fixed 
specific pair of words. There does not exist a single technique for 
the establishment of the identity of any pair of words, similar to the 
technique described above for the proof of the validity or nonvalidity 
of any relation in elementary algebra. 

The problem of word identity for groups in its basic features co- 
incides with the corresponding problem for the semigroups. The free 
group G@ with the generating elements Xs Xys sees X, 28 constructed as 


the ensemble of words composed from the letters Xys Xps ees X, and 


the "inverse" letters ae es ane me In this case two mutually 
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inverse letters standing side by side cancel one another (become equiv- 
alent to an empty word) 
ip! xy 'k, a 6. (8) 

In the determination of the identity of two words in a gyvoup with 
the system of defining relations S, we must take account not only of 
the relations appearing in this system but also the relations of the 
form (8). Just as for the remigroups, the word identity problem for 
groups which are specified by a finite number of generating and defin- 
ing relations is algorithmicly insoluble in the general case. Examples 
of groups with insoluble word identity problem were first constructed 
by Novikov [60]. 

How can the algorithmic insolubility of a particular problem be 
proved? The classical example of such an insoluble problem is the prob- 
lem of the recognition of the selfapplicability of algorithms. For 
the exact formulation of this problem we shall treat only normal algo- 
rithms in alphabets consisting of no less than two letters. With this 
assumption we can, without losing generality, stipulate that some let- 
lers of the alphabet of any algorithm with which we will be concerned 
will be identified with the two letters (0 and 1) of the standard bina- 
ry alphabet. From the assumed condition, for any algorithm A consid- 
ered, its representation A“ in the standard binary alphabet can be con- 
sidered as the input word of this algorithm. If the word Au appears in 
the domain of definition of the algorithm A, then the algorithm is 
termed selfapplicable, otherwise it is termed nonselfapplicable. 

Both selfapplicable and nonselfapplicable algorithms exist. An 
example of the selfapplicable (normal) algorithm is the so-called i- 
dentity algorithm in any alphabet #, which contains two or more than 
two letters. By definition this algorithm is applicable to any word p 
in the alphabet z and transforms any input word into itself. An 
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example of the nonselfapplicable algorithm is the so-called zero al- 
gorithm in any finite alphabet 9. This algorithm is given by a scheme 
containing the identity substitution — y (where y is any letter of the 
alphabet ®). By its very definition it is not applicable to any input 
word, and this means that it is not applicable ot its own representa- 
tion. 

The problem of the identification of the selfapplicability of 
the algorithms amounts to finding a single constructive technique 
which makes it possible, after a finite number of steps using the 
scheme of any given algorithm A in some fixed algorithmic system (for 
example, in the system of normal algorithms), to recognize whether the 
algorithm A is selfapplicable or not. 

If we consider that the normalization principle formulated in §2 
is valid, we can assume that the single constructive technique in ques- 
tion is none other than the normal algorithm B, defined on any word p, 
which is the representation of the arbitrary normal algorithm A and 
which transforms this word into two different fixed words q and Io de- 
pending on whether the algorithm A is selfapplicable or not(the word 
qa, 1s the code of the word "selfapplicable" and Ap is the code of the 
word "nonselfapplicable"). 

On any input word 1 which is not ee representation of any (nor- 
mal) algorithm, the algorithm B also must be defined. Actually, other- 
wise, not obtaining any result after some number (sufficiently large) 
of steps of operation of the algorithm, we would not know whether the 
word 1 is the representation of a selfapplicable or nonselfapplica- 
ble algorithm. It is clear also that the result of the application of 
the algorithm B to any word which is not the representation of an algo- 
rithm must be different from the word qa and also from the word I: 

Let us assume that the algorithm B with the indicated properties 
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exists. In this case there exists the normal algorithm C in the same 
alphabet #, as the algorithm B, defined on all those and only those 
words in the alphabet %, which are the representations of nonself- 
applicable algorithms (we recall that from the definition itself of 
the algorithm B, the alphabet % includes in itself the standard binary 
alphabet). 

Actually, let us construct the normal algorithm D in the alphabet 
X, whose domain of definition consists of only the single word doe Such 
an algorithm can be given, for example, in the form (normalized) of 
the superposition of two normal algorithms D, and Ds; the first of 
which is given by a scheme consisting of the single substitution Go *; 
while the second is given by a scheme consisting of substitutions of 
the form X, 7 Xs; where x, runs through all the letters of the alpha- 
bet X% It is clear that the first algorithm transforms into an empty 
word only the word Io» while the domain of definition of the second al- 
gorithm consists only of an empty word. Therefore the domain of defini- 
tion of the superposition D of the algorithms Dd, and D, will consist 
only of the word Io» which we require. 

After constructing the algorithm D, forming the superposition of 
it with the algorithm B, and normalizing this superposition, we arrive 
at the normal algorithm C in the alphabet #, whose domain of definition 
consists of all those and only those words in the alphabet X which are 
forms of nonselfapplicable algorithms. However, this property of the 
algorithm C is intrinsically contradictory, since the algorithm C can- 
not be either applicable or nonapplicable to its own representation Ga 

Actually, in the first case the algorithm C would be applicable 
to its representation and therefore would be selfapplicable. But this 
would contradict the fact that as a result of its construction the al- 
gorithm C must be applicable only to the nonselfapplicable algorithms. 
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In the second case, being nonapplicable to its representation, the al- 
gorithm C would belong to the number of the nonselfapplicable algo- 
rithms. But then, by definition the algorithm C would have to be appli- 
cable to its representation, since it is applicable to the representa- 
tion of all nonselfapplicable algorithms. Consequently, the algorithm 

C is selfapplicable. 

Thus, the assumption on the algorithmic solvability of the prob- 
lem of the recognition or selfapplicability leads to a logical contra- 
diction and therefore is not valid, which proves the algorithmic unde- 
cidability of this problem. 

We have substantiated this conclusion only for the condition that 
the algorithm normalization principle is valid. However, the nature of 
the contradiction used for the proof of the algorithmic insolvability 
of the problem of the recognition of the selfapplicability of algo- 
rithms is in actuality more profound. The reader who is familiar with 
the paradoxes of the theory of sets and of mathematical logic will eas- 
ily note that this contradiction has the same nature as the contradic- 
tion in the known paradox of Russel which establishes the intrinsic 
contradiction of the concept of a "set of all sets not containing it- 
self as an element." 

This circumstance leads to the conclusion that the algorithmic un- 
decidability of the problem of the recognition of selfapplicability is 
not a result of the narrowness of the modern exact concept of the algo- 
rithm. If we were able to construct an exact concept of the algorithm 
which includes certain nonnormalizable algorithms, then the problem of 
the recognition of the selfapplicability of the algorithms would re- 
main as before algorithmicly undecidable. 

From the algorithmic undecidability of the problem of the recogni- 
tion of the a lala ad of the algorithms, the algorithmic 
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undecidability of a whole series of other problems is developed. The 
general method for these derivations amounts to the derivation from 

the assumption on the existence of the algorithm which solves a partic- 
ular problem Q of the existence of the algorithm which solves the prob- 
lem of the recognition of the selfapplicability of the algorithms. 
Since the latter is impossible, the existence of the algorithm which 
solves the problem Q also is impossible. 

Using the genneral method, the algorithmic undecidability of a 
set of different problems has been proved, including the general prob- 
lems of the identity of words for groups and semigroups considered a- 
bove. We shall mention some other algorithmicly undecidable problems 
whose undecidability has been established by this same method. One 
problem is that of the recognition of the applicability of some algo- 
rithm to a particular word. There can be constructed an algorithm A, 
operating in some alphabet &, for which there does not exist an algo- 
rithm in the alphabet ¥ , and in any expansion of it, which transforms 
into some fixed word those and only those words to which the algorithm 
A is not applicable. 

The problem of the construction of an algorithm which transforms 
into the fixed word p all the words to which any given algorithm A is 
applicable is, as it is not difficult to see, algorithmicly undecida- 
ble; for its solution it is sufficient to construct the algorithm B 
which transforms into the word p all words in the alphabet of the algo- 
rithm A and to form tha: duperposition of the algorithms A and B. We 
stipulate that an algorithm annuls particular words it it transforms 
them into the empty word e. The problem of the recognition of annul- 
ment for ay given algorithm A consists in the construction of the al- 
gorithm B (in the same alphabet as A) which annuls all those and only 
those words which algorithm A does not annul. This problem in the 
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general case is algoritnmicly undecidable, namely: we can select the 
algorithm A so that the algorithm B with the indicated properties can- 
not be constructed. 

Quite frequently in the proof of the algorithmic insolvability of 
particular problems use is made of the Post [64] proof of the algo- 
rithmic insolvability cS the following problem, which has been termed 
the Post combinatorial problem. Assume that in the arbitrary finite al- 
pabet 2 there are given any finite systems S of pairs of words (p,, 
qa,); <2 ee (p,» q,): We are required to construct a single constructive 

technique which will make it possible for any such system §S after a 
| finite number of steps to answer the question of whether we can con- 
struct a word Bis Pi, eee Ph. from the first elements of the pairs of 
the system S such that it will coincide with the word us, 14, oe 4? 
constructed from the corresponding second elements of the same system 
of pairs. 

The problem of matrix representability is also algorithmicly un- 
solvable. For the formulation of this problem we stipulate that a ma- 
trix is representable in terms of the matrix Uy, Ups ee Un if for 
some finite sequence (generally speaking with repetitions) ae ete cae 


U of these matrices the product U, U eee U, of ail the matrices 
k 


i i 

eprtarite in the given sequence rolietde ante the given original ma- 
trix U. The representation problem consists in finding the general con- 
structive technique by which, after a finite number of steps for any 
matrix U and any finite system S of matrices, we would be able to know 
whether the matrix U is hepieach cable in terms of the matrices of the 
system S or not. 

We recall that the algorithmic undecidability of all the indica- 


ted problems is proved on the assumption of the validity of the normal- 


ization principle; however, as noted above, the nature of this 





undecidability is more profound. and, in a certain sense, is independ- 


ent of this principle. 
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13 The word p is termed the initial segment of the er if 
the word g has the form g = pr, where r is any word (includ- 


ing, possibly, an empty word). 


Chapter 2 
BOOLEAN FUNCTIONS AND PROPOSITIONAL CALCULUS 
§1. CONCEPT OF BOOLEAN FUNCTIONS 

Boolean (or switching) function is the term customarily given to 
those functions for which’ all the arguments, and the functions them- 
selves, can take on only two values. 

The role of the boolean functions in cybernetics is determined by 
two basic characteristics. First, the boolean functions are a conveni- 
ent apparatus for the description of the circuits of many information 
converters constructed using the discrete principle, since with cur- 
rent technology it is far easier to construct discrete elements func- 
tioning directly in the binary alphabet and not in some other alphabet. 
Second, the boolean functions are sidely used in mathematical logic, 
which is one of the foundations on which the automation of the complex 
thought processes is founded. 

The use, along with the usual variables which take on numerical 
values, of the boolean variables, which have only two possible values, 
plays a significant role in the design of various kinds of practical 
algorithmic systems for programming on the electronic computers. The 
boolean functions can also be used successfully for the solution of 
certain general questions of the theory of algorithms, for example to 
refine the concept of algorithmic complexity. The two possible values 
of the variable which figure in the definition of the boolean func- 
tions can be designated arbitrarily. In practice, however, two nota- 


tion systems are used most frequently. The first (for use of the 
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boolean functions in the theory of automata circuits) assigns to the 
possible values of the boolean variables the notations O and l. We 
shall term the symbols introduced, just as in the case of numerals, ze- 
ro and one, considering that here the zero and one appear not as numer- 
als, but only as convenient notations for the letters of the abstract 
binary alphabet. In the future we shall assign these symbols several 
properties which make it possible to consider them (with one ¢x-cption) 
as ordinary numerals (this is precisely the convenience of the nota- 
tion system being considered). But all such properties must be precise- 
ly defined before use. We cannot, in particular, eae make use of the 
properties of zero and one which result from the existence of the oper- 
ations of addition and multiplication for numbers, since we have not 
yet defined these operations for these symbols. 

In the second system of notation, the words "true" and "false" 
serve as the notations for the two possible values of the boolean vari- 
ables. This system of not»tion is used in mathematical logic, primari- 
ly in the portion which is called propositional calculus. Its applica- 
tion is associated with the circumstance that in the propositional cal- 
culus the boolean variables are interpreted as the propositional vari- 
ables, considered from the point of view of the truth or falsity of 
the proposition. 

In the present and three following sections we shall make use of 
the first system of notation without specifying this each time. When 
it is necessary to make a transition from one system of notation to 
the other, we stipulate that one corresponds to true and the zero cor- 
responds to false (we could, of course, assume exactly the opposite 
correspondence). 

Let us consider the boolean functions or any finite number of ar- 
guments. Of the number of arguments is equal to n, then it is customary 
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to term the corresponding function n-place. As a result of the fact 
that each boolean variable can take only two values, the domain of def- 
inition of any boolean function will of necessity be finite. It is easy 
to see that the domain of definition of an n-place boolean function 
can consist of a maximum of gt different elements, which are all possi- 
ble sets of values of its n arguments. We will usually order the argu- 
ments of a given boolean function by assigning them the numbers 1, 2, 
-oe, n. In this case the set of values of the arguments is identified 
with some cortege (finite ordered sequence) of zeros and ones. For ex- 
ample, the set of values x, = Ls X» = 0, X3 = O of arguments of the 
three=place boolean function f(x,; Xo» X3) can be abbreviated in the 
form of the cortege 100, and the set x, = QO, X = 0, X3 = ] can be 
written in the form of the cortege 001. In the future we frequently 
shall term these corteges simply sets (here the arguments are always 
numbered in a definite order--in the order in which they are encoun- 
tered in the notation f(x, Xoa sees x) corresponding to the boolean 
function). The term boolean in application to a cortege (set) denotes 
that the corresponding cortege is composed of zeros and ones. 

Each cortege of length n, composed of zeros and ones (a boolean 
cortege), can be identified with some vertex of an n-dimensional unit 
cube having the corresponding coordinates. For the two-dimensional 
case, when the n-dimensional cube reduces to a square, the method of 
identification of the boolean corteges with the vertices is shown in 
Fig. 4. As a result of the possibility of such identification, the 
boolean sets (corteges) will sometimes be termed points. 


In the present chapter we shall limit our- 
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(01) 7) selves (with the exception of specially stipulated 
(Td) x, cases) to the consideration of only those boolean 
mi r functions whose domain of definition includes all 
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sets of values of its arguments. Thus, the n-place boolean function 
must be defined at 2” different points. If we do not exclude the case 
when a particular boolean function can be undefined on at least one of 
the sets, then it is termed a partial boolean function. The considera~ 
tion of the partial boolean functions is useful for the synthesis of 
the circuits of descrete automata. In the theoretical aspect there is 
particular interest in the boolean functions which are everywhere de- 
fined, the more so since in case of necessity every partial boolean 
function can be redefined (generally speaking, by an arbitrary method) 
on those sets on which it was not initially defined. Therefore, speak- 
ing of the boolean functions hereafter (if not stipulated otherwise), 
we will understand them to be these everywhere-defined functions. 

We remark also that in the consideration of a particular boolean 
function we shall consider the number of its arguments given. The ne- 
cessity for this stipulation is due to the possibility of treating ev- 
ery n=place function as (n + 1)-place, (n + 2)-place, and in general 
as an (n + k)-place function for any natural number k. Actually, for 
example, the constant-function (equal identically to zero or one) can, 
if desired, be considered as a function of any number of arguments, ar- 
guments of which it is in actuality, however, independent. Similarly, 
we can to any function f(x); Xos sees x.) add any desired number of 
new arguments Xne1? 69° Snape On which the values of the function actu- 
ally does not depend. For this it is sufficient to assume that for all 
sets of values of the variables Xys Xoo sees Nyy the following equali- 
ty is valid 

F (Xt, Xa). % ps ngs Taga) =F (Xr Xe «.. oX,)- 

We shall term the described operation of the conversion of the n- 

place function into an (n + k)-place function the operation of formal 


assignment of arguments. This operation is obviously applicable to any 
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functions (and not only boolean). 
As we noted above, there are exactly 2” boolean sets (corteges) 
of length n. These sets can be considered as the representations of 


certain whole numbers in the binary number system such that the set 


Oss Ass sees OG is identified with the binary representation of the 
number a,-2n-t + a+ 2-2 t.ee G@ eta, (here the boolean values 0 
and 1 are considered simply as the usual numbers 0 and 1). We shall 
term this number the number of the corresponding set. The numbers of 
the sets vary from zero (for the set consisting only of zeros) to 

2" _ 1 (for the set consisting only of ones). The number of the set 
010 will be the number 0+2° + 1¢2 + 0 = 23 the number of the set 101 
will be the number 12° + 0-2 +1 = 5, etc. 

Arranging the sets in columns one after the other in the order of 
increase of their rm umbers and placing alongside each set the value of 
the boolean function on this set, we obtain the value table of the 
boolean function. Since on each set the function can take either of 
two values (0 or 1) regardless of its values on the remaining sets, 
for m sets we can define exactly 2” aifferent (differing from one an- 
other by their values on at least one set) boolean functions. Keeping 
in mind the total number of sets for n variables (equal to 2”) defined 
above, we come to the conclusion that the number of different boolean 


functions of n arguments, which we shall designate B(n), is determined 


by the equation 
B(n) = 2", (9) 
With n = 1 the quantity B(n) is equal to 4, and with increase by 
1 this quantity is squared: B(n+1) = (B(n))°. Thus, if the number of 
single-place boolean functions is equal in all to 4, then the number 
of different two-place (boolean) functions will be equal to 16, three-~ 
place to 256, four-place to 256° = 65,000, five-place to e564 = 4 
28s 


million, six-place to about 16 trillion (16-10!) and so on. ‘Vu 
practical possibilities of sorting all the bootean functions are thus 
limited to the three-place or at best the four-place functions. 

Although every boolean function can be given in the form of its 
value table, in the majority of cases of practical application of the 
theory of boolean functions this method of specification is inconveni- 
ent. Therefore, one of the primary tasks of our further constructions 
will be the development of new and more convenient methods of specify- 
ing the boolean functions. In this connection, of particular impor- 
tance are the boolean functions of one and two arguments, since, as 
wi'l be shown later, with their aid we can represent any boolean func- 
tions. Therefore, we shall make a more detailed study of the single- 
place and two-eplace boolean functions. 

Of the four single-place functions gm (x) which can in general be 
constructed, two functions are the constants O and 1 wnich are not ex- 
plicitly dependent on x. Still another function simply repeats the val- 
ue of its argument (x) = x and therefore also is nct of inters. . The 
last, fourth, function, for which we introduce the speclal nota ‘a. x 
or |x, always has a value which is the opposite to that of its «vpu- 
ment: 0 = 1 and 1 = 0. This function 1s termed inversion or ne; tion. 
The expression x (and also the expression |x) is read as "negation x" 
or "not x." In the theory of boolean functions, and also In the appli- 
cations of this theory to the synthesis of automata circuits, follow- 
ing tradition, we shall make use of the notation x. In mathematical 
logic (end of the present chapter and beginning of the sixth chapter) 
and also in the practical aspects of the theory of algorithms (end of 
the fifth chapter) it is for several considerations more convenient to 
use the notation |x. 

Of the 16 different two-place boolean functions f(x, y) which in 
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general can be constructed, six functions reduce to functions of a 
smaller number of arguments. These are, first, again the two constant 
functions (0 and 1), second, the two functions whlch repeat the values 
of some argument (x or y), and, third, two functions which are the ne- 
gations of each of the arguments (x and y). 

The ten remaining functions f(x, y), which actually depend on 
both of their arguments, can be divided into pairs such that the sec- 
ond function of the pair is the negation of the first function (1.e., 
it has on each set a value which is the opposite of the value ol’ the 
first function). In this case use is actually made of the single-place 
boolean function X for the construction of the single-place negation 
operation on the set of all boolean functions. The application of the 
negation operation to any boolean function g can be treated as the Sub- 
stitution of the function in place of the argument of x into the funec- 
tion X. Such a substitution of some loolean functions in place of the 
arguments of other boolean functions (termed Superposition of these 
functions) will be widely used hereinafter for the formation of vari- 
ous operations on the set of boolean functions (boolean operations). 
For the designation of the operations thus construct’, we usually 
make use of the notation of the boolean functions which generated 
these operations. In our case g (or |!¢@) will serve tor the notation 
for the negation of the arbitrary boolean function 4. 

The separation described above of the two-place boolean f'unctions 
into pairs (g, g@) makes it possible to actually limit ourselves te the 
description of only five functions, which we select as the first ecle- 
ments of the pairs indicated. 

Let us begin the description with conjunction, also termed (lugi- 
cal) product, or the logical AND operation. In mathematival logic Lt 
is customary to designate the conjunction of the variables x and y by 
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x & y or x A y (we shall use the second of these notations). By defini- 
tion, the conjunction x A y is equal to one when, and only when, both 
of its arguments x and y are equal to one. 

For the conjunction x A y to be equal to zero it is sufficient 
that at least one of its arguments (x or y) become zero. These proper- 
ties of the conjunction are completely analogous to all the properties 
which the product xy would have if the cofactors composing it could 
take on only two numerical values--O and 1. This circumstance suggests 
considering the boolean constants (0 and 1) as sort of "pseudo-numbers" 
for which the multiplication operation is defined which possesses all 
the properties of the usuel multiplication operation for the numbers 
O and 1: 

0-0=0, 0-1 =0, 1.0=0, 1-1 =1. 

In the theory of boolean functions and in its applications to the 
theory of automata, it is convenient to take precisely this point of 
view. Moreover, in these cases we shall simply identify the conjunc- 
tion operation with multiplication, both in name and in form of repre- 
sentation. In other words, in place of the notation x A y we shall use 
the notation xey, or xy, and also shall make use of the terms "prod- 
uct," cofactor" and all the properties of multiplication from conven- 
tional elementary algebra. It is easy to understand that, as a result 
of the coincidence of the definitions, multiplication in our case will 
have all the general (satisfied identically) properties of multiplica- 
tion in conventional algebra (commutativity, associativity, and so on). 
At the same time, the limitation on the set of possible values of the 
quantities leads to the appearance for the logical multiplication 
which we are considering of some properties which conventional multi- 
plication does not have. For example, in the case of logical multipli- 
cation the identity relation x*x = x becomes invalid if in place of 
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the values O and 1 we substitute into this relation other numerical 
values of the quantity x. 

Just as in the case of negation, multiplication (conjunction) can 
be considered not only as a function, but also as an operation on the 
set of all boolean function. For this purpose it is sufficient in 
place of the independent variables x and y to substitute in the pro- 
duct xy two arbitrary boolean functions f and g: p = fg. Similarly, any 
other two-place boolean function b(x, y) defines a two-place, or, as 
it is usually customary to say in algebra, binary operation on the set 
of all boolean functions, which we shall term and designate just the 
same as the corresponding function b(x, y). Of course, in this case 
the independent variables x and y are replaced by the arbitrary bool- 
ean functions f and g. Hereafter we shall use the described technique 
for the introduction of new binary operations on the set of boolean 
functions without detailed explanations. 

The possibility of the interpretation of conjunction as conven- 
tional multiplication suggests also looking for boolean analogs for 
conventional (numerical) addition. In contrast with multiplication, 
here there cannot be complete analogy, of course, since the equality 
1+ 1s 2 in the case of conventional addition introduces a third quan- 
tity (two) which differs from both zero and one. With the limitation 
to only the boolean (binary) alphabet, the direct interpretation of 
this fact is, of course, impossible. Therefore we can define two dif- 
ferent (but incomplete) analogs of numerical addition for the boolean 
quantities, setting the "sum" of two ones equal to either one or zero. 

The operation (two-place boolean function) which arises with the 
first assumption is termed disjunction, logical addition, and also log- 
ical (the so-called inclusive) OR. For the designation of this opera- 


tion we fix the special symbol (disjunction sign) \/. Thus, the 
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disjunction of the two quantities x and y (independent variables or 
functions) will be designated as x V y. The quantities x and y them- 
selves in this case are termed the logical addends, or more frequently 
the disjunctive terms. 

The system of relations which completely defines the operation of 
dis junction is written in the formoy0=0,0V!l =1,1VO=lLly! =! : 

The first three relations are exactly the same as in the case of con- 
ventional (numerical) addition, and only the fourth relation differen- 
tiates logical addition from conventional. In view of the relations in- 
troduced, the disjunction oi the two quantities x and y is equal to ze- 
ro when and only when both these quantities become zero. If even one 

of the quantities indicated takes the value 1, then this same value of 
1 is taken by the disjunction itself, regardless of the value of the 
other disjunctive term. 

A more fortuitous analog of conventional (numerical) addition is 
obtained in the case when the "sum" of the two ones is assumed to be 
equal to zero. The operation which arises in this case (two-place bool- 
ean function) is usually termed the non-equivalence operation, exclu-. 
sive OR, and also modulo two addition. The last term is associated 
with the fact that this operation coincides with modulo two addition 
as defined in number theory if the zero and one are considered as ordi- 
nary numbers. 

For brevity we shall term this operation simply addition and 
shall use such terms as sum and addend by analogy with conventional ad- 
dition. We shall use the usual (+) sign to designate the operation of 
modulo two addition. In order to emphasize that we are not discussing 
conventional addition, we will at times put a circle around this sym- 
bol. 

The operation of addition of boolean quantities is defined by the 
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following four relations: 0+0 #0, O0O+1le2*i1,1+0O0F81,1+1=£=0. 
The first three of them are exactly the same as in the case of conven- 
tional (numerical) addition (and the same as in the case of logical ad- 
dition--dis junction), so that the specific nature of the operation in- 
troduced is defined primarily by the fourth relation. With this same 
relation there is associated the term for the addition operation, ex- 
clusive OR , which is used in mathematical logic. If we interpret one 
as true and zero as false, then the sum of two boolean quantities will 
be true when and only when either the first or second quantity is true, 
but not when they are both true. In the case of the logical sum (inclu- 
sive OR) the sum is also true when both addends (disjunctive terms) 

are true together. OR in this case does not exclude the simultaneous 
truth of both terms, it does not separate the question of the truth of 
the sum into two mutually exclusive cases, and this is the source of 
the association of the term "inclusive" as applied to OR in the logi- 
cal sum (disjunction). 

Still two more two-place boolean functions are the result of the 
single binary operation termed implication, or the operation of logi- 
cal succession. We use the symbol > for the designation of this oper- 
ation. Implication is defined by the following four relations: 
070=1, 0D! =1, 15 0=0,1D! =!1. In the implication x Dy, in con- 
trast with multiplication, disjunction and addition, the order in 
which the terms are arranged is of essential importance. With a rever- 
sal of this order the value of che implication changes so that x Dy 
and y > x are two different boolean functions. 

If we designate the two-place boolean function f(x, y) by the cor- 
tege O54 5%5%3) , where a, is the value taken by this function on the 
set with the number i(1i = 0,1,2,3), then the implication x Dd y will 
correspond to the cortege (1101) while the implication y 5 x corre- 
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sponds to the cortege (1011). We note at the same time that the prod- 
uct, disjunction and sum of the variables x and y, regardless of the 
order in which these variables are written, are respectively the cor- 
teges: (0001), (0111) and (0110). 

From a consideration of all the corteges presented, it follows, 
incidentally, that all five of the two-place boolean functions which 
we have defined (product, disjunction, sum and two implications) are 
pairwise different. It is easy to see that the cortege for the nega- 
tion of any boolezen function is obtained from the cortege for the func- 
tion itself by replacing all the zeros by ones and all the ones by ze- 
ros. Using this rule, we can determine the cortege for negation of the 
product xy, negation of the disjunction x VV y, negation of the sum 
xX + y, and the two negations for the implications x >y and y 5 x. 
These corteges will be respectively (1110), (1000), (1001), (0010) and 
(0100). 

It is easy to verify that, together with the five functions previ- 
ously introduced, the five new functions (negations of the preceding 
five) compose a system of ten pairwise different two-place boolean 
functions. They all differ also from the constant-functions O and 1 
and the functions x, y, X, y, considered as functions of the two vari- 
ables x and y, since the latter functions are characterized by the cor- 
teges (0000), (1111), (0011), (0101), (1100), (1010) respectively. 
Thus, we have written out all 16 of the two-place boolean functions 
which can in general be constructed. 

Let us make a few more remarks concerning the functions intro- 
duced above. The function xy (negation of the product) which is charac- 
terized by the cortege (1110) and the binary operation which is de- 
fined by it are customarily termed Sheffer's stroke function. It is 
easy to verify (using the definitions of negation and dis junction) 
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that the Sheffer stroke can be represented not only in the form of the 
negation of the product xy, but also in the form of the disjunction of 
the negations x V y. 

The negation of the disjunction x y--the so-called Pierce func- 
tion--characterized by the cortege (1000), can be represented also in 
the form of the product of the negations of the variables x and y, l.e., 
in the form x. y. It is easy to see that both the Sheffer stroke and 
the Pierce function, similar to the product, disjunction and sum, are 
Symmetric functions, i.e., they do not change their values with permu- 
tation of the arguments. 

The negation of the sum x + y, termed the equivalence operation 
or logical equivalence possesses a similar property. For the designa- 
tion of this function and also for the binary operation defined by it, 
we use the special symbols ~ or =(equivalence symbol). The function 
x + y =x~ y is characterized by the cortege (1001). The terms “equiv- 
alence" and "nonequivalence" as applied to the functions x ~ y and x + 
y respectively emphasize the fact that the first function is equal to 
one when and only when the values of its arguments are equal to one an- 
other, and the second--when the values of its arguments are unequal. 

The function (binary operation) of impiication can be expressed 
by disjunction and negation. It is easy to verify that x Dy =x Vy 
and y )x = x Vy. Negation of an implication, also termed the inhibit 
function, is easily expressed by the product and negation: x Dy = 
xey, Yy )X =X. y. Both implication and the inhibit function are ex- 
emples of asymmetric boolean functions, since they change their values 
with permutation of the arguments. 

In conclusion we note that in reading formulas the conjunction 
symbol A (or &) is pronounced as "and," the disjunction symbol V is 
read as "or," the sum sign + (or 6) is read "plus," the implication 
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sign > is read "implies," the equivalence sign ~ (or =x) is read as e- 
quivalent," and the negation sign (or 1) is read as "not." 

All the ten listed two-place boolean functions correspond to the 
respective two-place boolean operations, which we shall designate and 
name exactly the same as the functions which define them. 

§2. BOOLEAN ALGEBRA 

Boolean algebra will be termed the set of all (finite-place) booi- 
ean functions considered together with the operations of negation, dis- 
junction and multiplication (conjunction) specified on them. 

We shall use the letters u, v, w.... (with or without subscripts) 
to designate any elements of boolean algebra, i.e., in other words, an- 
y boolean functions. One of the primary problems of boolean algebra is 
the establishment of the identity relations of the form A(u,v,w, ...) = 
= B(u,v,w, --.) where A(u,v,w, ..-.) and B(u,v,w, ...) designate formu- 
ias, i.e., expressions of boolean algebra, constructed from a finite 
number of letters u,v,w, ..., the signs of the three operations of the 
algebra, the boolean constants (O and 1) and parentheses for the desig- 
nation of the order of performance of operations. 

The formulas must be constructed properly. In other words, they 
must reduce to completely determinate boolean functions after the se- 
lection of pasticular boolean functions as values of the letters u,v,w, 
»e. appearing in these formulas. We can give a rigorously formal defi- 
nition of the properly constructed formula, introducible recurrently 
using the rule: all the letters u,v,w, ... (with or without subscripts) 
and the constants O and 1 are properly constructed formulas. If A and 
B are properly constructed furmulas, then (A), (A)V(8) and (A)-+(B) 
are also properly constructed formulas. A set of prop: rly constructed 
formulas is considered coincident with the set of all formulas which 
can be obtained as the result of sequential (multiple, generally speak- 
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ing) application of this rule. 

The introduction of each additional operation into the formula is 
accompanied by the appearance of one or two pairs of parentheses. To 
avoid excessive cumbersomeness of the formulas, we somewhat expand the 
concept of the rule for the construction of the formula, making it pus- 
sible to drop some parentheses by analogy with the way this is done in 
elementary algebra. To do this we introduce the rule on the priority 
of operations: other conditions being equal, negations are performed 
first, then multiplication, then disjunction. When it is necessary to 
perform operations in a different order, parentheses are required. In 
addition, the negation sign written by a bar over an entire expression 
would have had to have been written. It will also be established later 
that the order in which like operations are performed which follow di- 
rectly after one another in the formula is of no concern, so that in 
this case the parentheses are again redundant and can be dropped. Fi- 
nally, we recall that the multiplication sign between letters can be 
dropped. 

All the properly constructed formulas obtained as the result of 
the described expansions will hereafter be termed simply formulas, per- 
mitting using in them in addition to the letters u,v,w, ... any other 
letters of the Latin alphabet. 

There is a very simple general rule for the verification of the 
correctness of the identity relations in boolean algebra. The essence 
of this rule amounts to the following. 

Every formula A(u,v,w, ...) of boolean algebra can be considered 
as the representation of some boolean function of the variables u,v,w, 
..-.- Actually, of we assign these variables some constant values (0 
and 1) then, using the relations which define the operations of nega- 
tion, disjunction and multiplication (1.e., relations of the form 
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6 = 1, 0VI=1 and so on), we can after a finite number of steps find 
the value (O or 1) of the formula A(u,v,w. ...) for the selected val- 
ues of the variables u,v,w, .... and this then means that our formula 
is some everywhere-defined boolean function of the variables u,v,w, .... 
It is easy to understand that the (identity) relation A(u,v,w, 
...) = Blu,v,w, ...) 18 valid in and only in the case when the formu- 
las A(u,v,w, ...) and B(lu,v,w, ...) represent one and the same boolean 
function of the variables u,v,w, .... For the verification of the fact 
of the indicated equality of the two representations it is sufficient 
to verify whether the values of these representations on all sets of 
values of the variables u,v,w, ... coincide or do not coincide. 
Thereby we have constructed a general algorithm, suitable of the 
verification of the correctness of any identity relations in a boolean 
algebra, since in view of the finiteness of the number of sets of val- 
ues for any finite number of sets of the boolean variables the verifi- 
cation described always terminates after a finite number of steps. 
Moreover, it becomes clear that it is sufficient to establish the 
identity relations in the boolean algebra for the case where all the 
letters appearing in these relations are considered as independent 
(boolean) variables. In case of necessity, moreover, any boolean func- 
tions can be substituted in place of these variables. 
We shall designate the independent variables by the letters x, y, 
z (with or without subscripts). We shall also use these same letters 
for the writing of the identity relations of boolean algebra. We shall 
make a verification of the indicated relations with the aid of substi- 
tuting into them all the possible sets of values of the variables (let- 
ters) appearing in these relations. 


As an example let us consider the commutativity relation for mul- 


tiplication 
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xy = ye. (10) 

To convince ourselves of the correctness of this relation it is 
sufficient to note that its left and right parts are equal to zero on 
the sets (00), (01), (10) and equal to one on the set (11). In view of 
the triviality of such a verification we shall not repeat it in the wu- 
ture and shall limit ourselves to simply writing out the relations we 
need, which we shall also term laws or rules. 

In addition to the relation (law) of commutativity, for multipli- 
cation there also exist the so-called law of associativity, expressed 
by the equality 

% (yz) = (xy) z. (11) 


Multiplication satisfies still another law, usually termed the 


idempotency law 
XX xX. ( 12) 
As a result of this law, the concepts of power and raising toa 


power have no actual importance for the boolean algebra. 
The laws of commutativity, associativity and idempotency extend 


also to the disjunction operation. The corresponding relations are 


written 
xVysyV% (13) 
rVYUVaakkVyVa (14) 
aVx =x. (15) 


Multiplication and disjunction are related with one another by 


the first and second distributive laws, which can be expressed by the 


relations 
XY V 2) = xy V xz; (16) 
xV yz =(xV y)-(* V 2). (17) 
We note that, on the strength of the agreements made on the prior- 


ity of the operations, the right side of relation (16) is a simplifica- 
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tion (as a result of discarding the redundant parentheses and the mul- 
tiplication sign) of the formula (x-y) V (x-z), while the ieft side of rela- 
tion (17) is a simplification of the formula (x)V (y-2) 
For multiplication and disjunction there are valid the so-called 
absorption rules, expressed by the following relations 
xV xy =X; (18) 
x(x V y) =x. (19) 
For the negation operation the law of double negation is of great 
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=x. , (20) 
On the strength of this law any even number of negations performed in 
sequence does not alter the result, while any odd number is equivalent 
to performing a single negation. 

For the various transformations in boolean algebra we frequently 
need to make use of the so-called de Morgan rules, which combine to- 


gether all three algebraic operations, 





my=xV i (21) 
rVy ax. (22) 
We point out several more relations which include the constants 
O and l: 
zVxal; (23) 
xx = 0; (24) 
x-0=0; (25) 
xl= x, (26) 
xVO=~x; (27) 
rVl=]; (28) 
a0: (29) 
Ont. (30) 


Relation (23) is called the law of the excluded middle, relation 
= 652 


(24) 41s the law of contradiction. Relations (25) and (28) can be con- 
sidered as particular cases of the absorption rules. 

Let us consider some corollaries from this system of relation- 
ships. From the laws of commutativity and associativity for disjunc- 
tion and miltiplication, there follows the possibility of performing 
in any order the actions for finding the values of the product and the 
disjunction for any finite number of terms. From this there follows 
the previously noted possibility of writing formulas of the form 
V%LV.. VX, and %%:--%m without parentheses with no chance of ambigu- 
ity as the result of variations of the order of performing the opera- 
tions. 

We note also that, as follows from relations (25) and (28), the 
presence of even a single one in the disjunction of the form x,\ x,\... 
Vx, is sufficient to transform tiie entire disjunction into a one, just 
as the presence of even a single zero cofactor in the product X1Xp +++ 
Xn transforms this entire product into zero. At the same time, rela- 
tions (26) and (27) show that in any disjunction the terms equal to ze- 
ro can be dropped, and in any product the terms equal to one can be 
dropped. 

On the strength of relation (20), any number of negations per- 
formed in sequence reduces either to a single negation or in general 
to the absence of any negations. We shall use x (read as "wavy x") to 
designate an expression which can be equal to either of the two expres- 
sions x or x. Following the rule established above for the verifica-. 
tion of identity relations in boolean algebra, we shall term the formu- 
las representing the same boolean function of the variables appearing 
in them equal, or equivalent, to one another. Although the equality or 
inequality of any two formulas of boolean algebra can in principle be 


verified by means of the sorting of all possible combinations of the 
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values of the variables appearing in them, with an increase of the num- 
ber of variables this method becomes excessively cumbersome and is not 
suitable in practice. Therefore, one of the primary tasks of boolean 
algebra is the development of more economical methods of establishing 
the various kinds of relations which obtain in this algebra. 

For the resolution of this problem we can make use of the previ- 
ously derived relations (10)-(30), applying them repeatedly and in var- 
ious combinations. For example, two-fold application of relation (12) 
makes it possible to establish the validity of the relation xxx = x, 
multiple application of relations (10) and (13) makes it possible to 
extend the laws of commutativity for disjunction and the product to an- 
y desired number of disjunctive terms and, correspondingly, cofactors. 
Thus, there arises the possibility of proving various relations in 
boolean algebra by transforming their left and right sides using rela- 
tions (10)-(30). If in doing this we manage to reduce the left and 
right sides of some relation to the same formula, then the validity of 
the corresponding relation is thereby established. 

It 1s not clear a priori whether such a method makes it possible 
to derive all the relations existing in boolean algebra. However, in 
actuality such derivation is always possible. To establish this fact, 
let us define some standard type of formula to which we shall try to 
reduce all the formulas of boolean algebra. In the reduction of a par- 
ticular formula A of boolean algebra to the standard form we shall al- 
ways fix some finite set M of the boolean variables X9 Xo cree ys Xs 
of necessity including all the variables which occur in the formula in 
question. We shall term every product of the variables or their nega- 
tions (i.e., the product of the form X, Ky wae x ) an elementary 


i 2 k 
product if each letter is encountered in the product no more than one 


time. 
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For example, the products X1%Xp OP X1XpX3 are elementary, while 
the products X4X1 or XgXoX3 are nonelementary. We shall include among 
the elementary products the variables X; themselves and their nega- 
tions x,, considering them as products consisting of a single cofactor. 
It is convenient also to consider that the constant 1 is an elementary 
product=-the product of zero (empty set) cofactors. The number of co- 
factors in a product is called its length. The elementary products for 
a selected set M of variables can thus have any length from 0 to n in- 
clusive. 

The elementary products of maximal length (in the present case, 
of length n) are customarily termed constituents of unity for the se- 
lected set (M) of variables. It 1s easy to see that every constituent 
of unity contains all the variables of the set M (either in the direct 
form or in the form of the negation) precisely one time each, and that 
the total number of all such constituents is equal to e. 

The disjunction of any number of elementary products which EOE 
not contain two identical products is termed the disjunctive normal 
form. The disjunctive normal form which consists exclusively of con-. 
stituents of unity is termed the ideal disjunctive ncrmal form. 

Just as in the case of the products, in this definition it is not 
excluded that the disjunction in question can consist of a single term 
(disjunction of length 1) and even of an empty set of terms (disjunc- 
tion of length 0). In the latter case the disjunction feo baken equal 
to zero by definition. Thus, the formulas 0, x,, x,\vx,x3,! can be consid- 
ered as disjunctive normal forms. The first of these formulas corsists 
of an empty set of terms, the second consists of a single term, the 
third consists of two terms, and the fourth consists again of a single 
term which is the elementary product of zero length. 


Replacing in all the definition the disjunctions by products, 
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products by disjunctions, the (boolean) constant O by the (boolean) 


constant 1 and vice versa, we obtain respectively the definitions of 


the elementary disjunctions, constituents of zero, the conjunctive nor- 
mal form and the ideal conjunctive normal form. 


In boolean algebra, as a result cf the fact that with replacement 
of zero by one and one by zero the disjunction is transformed into con- 
junction and vice versa, there arises a unique duality of the proper - 
ties of disjunction and conjunction (multiplication). Performing such 
a replacement, we can automatically for any property (relation) de- 
rived herafter obtain its dual property (relation). In particular, to 
all the properties of the disjunctive normal forms we can associate, 
using the indicated duality oe the corresponding properties of the 
conjunctive normal forms. Since this association is accomplished each 
time almost automatically, we shall limit GuEe Caves in the future to 
the consideration of only the disjunctive normal forms. 

Using relations (10), (11), (13)-(16), (23) and (26), we can 
transform any disjunctive normal form into its equivalent ideal dis- 
junctive normal form. Let us consider the process of such a transforma- 
tion using the example of the disjunctive normal form of three varia- 
bles xygzVsy2. which for brevity we shall designate with the single 
letter f. 

The third term of this formula is a constituent of unity and 
therefore does not require any transformations. In order to be a con- 
stituent of unity, the second term lacks the multiplier x (1,2 5. xoor 
Xx), and the first term lacks the factors y and z. On the basis of rela- 
tions (23), (26) we can write that f=x(yV yzV2)V yale Vx) V xyz . Using 
the first distributive law (relation (16)) and relations (10), (11), 
(13)-(15) we sequentially bring our form to the form f = (xy V xy) 

(z Vz) Vy2xV yex\ xyz =xyzV xy2\) xy2z\) xy2\ xyz\) xyz xyz =xy2V) xy2\ xyz\VxuzV xyz xyz.» The 
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last expression in this chain of equalities is the desired ideal dis- 
junctive normal form. We now establish the following important result. 

Theorem 1. With the aid of relations (10)-(30) any formula of 
boolean algebra can be reduced to the ideal disjunctive normal form. 

Actually, using several times the de Morgan rules (21) and (22), 
the double negation law (20), and also the relations (29) and (30), any 
formula A(x, Xps vee x,) of boolean algebra can be reduced without 
difficulty to its equivalent formula B(x, 5 Xo» re Xn» Xs Xo» Vacs 
X,)> which does not contain any negations other than the negations as- 
sociated directly with the letters Xy9 Xoo cory Xe It is easy to clar- 
ify the transformations required in this case from the example 

EV v2 V yz = (eV ye) 2) =* 2) OV I=ZYVAGVA. 

The described technique of sequential dropping of the negation signs 
is applicable to any formula of boolean algebra. 

The formula B(x), Xos eee, X, Xs Xo» ates x,) is constructed 
from the letters (with or sithout negations) shown in its designation 
with the use of only the multiplication and disjunction operations. Re- 
lations (10), (11), (13), (14) and (16) show that expressions, just ex- 
actly as in the usual school algebra course (considering disjunction 
as addition), can be transformed to remove all the parentheses and to 
group all like terms. After such transformation with subsequent ac- 
count for relations (25), (26) and (27) our formula i3 transformed in- 
to a disjunction of certain products of the letters Xs Xos eee X, 
and their negations. With the aid of relations (10), (12), (24) and 
(25) all these products can be transformed to their equivalent elemen- 
tary products or zeros. Now, using formulas (27) and (15), we reduce 
our formula to the ideal disjunctive normal form. An example ef this 
was discussed above. Thereby the theorem is completely proved. 

It is clear that the resulting ideal disjunctive normal form is 
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equivalent to the original formula since we used equivalent transforma- 
tions in each of the steps described above. 

We note that all the steps performed are reversibie, so that with 
the use of relations (10)-(30) we can also accomplish the reverse con- 
version from the constructed ideal disjunctive normal form to the orig- 
inal formula A(x), Koy sees x). 

Theorem 2. For the arbitrary boolean function f of any finite num- 
ber of variables Xy9 Xoo eres X, there can be constructed one and, 
with an accuracy to permutation of the disjunctive terms and cofactors, 
only one ideal disjunctive form with the same set of variables to 
which it is equal. | 

To each set (25 5An, 6a n) of values of the variables ae Xo» 


~ 


coos X there corresponds exactly one constituent of unity i, Stain Xn? 
which becomes unity on this set. This constituent is uniquely defined 
by the condition x, = x,, if a, = 1 and by x, = x, if a, = O(1 = 1,2, 
.»., n). All the remaining constituents for the given set of valucs of 
the variables have zero values. Since in a disjunction the terms which 
are equal to zero can be discarded, then it becomes clear that the dis- 
Junction g of the constituents of unity corresponding to all the sets 
on which the values of the function f are equal to unity 1s an ideal 
disjunctive normal form equal (as a boolean function) to the function 
f. It is clear also that every variation in the composition of the con- 
stituents of unity occurring in the form g will inevitably alter its 
value table and, naturally, will destroy the established equality. Con- 
sequently, the form g is defined uniquely by the function f, Q.E.D. 

In view of the indicated uniqueness of the definition, the form g¢ 
is customarily termed the ideal disjunctive normal form of the consid- 
ered function f. 

Two other important results follow directly from theorems 1 and 2. 
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Theorem 3. Any boolean function can be represented in the form of 
a formula of boolean algebra. 

Theorem 4. With the aid of relations (10)-(30) every formula of 
boolean algebra can be represented in any other formula which is equiv- 
alent to it (1.e., representing the same boolean function). 

Actually, as the formula representing any given boolean function 
f we can choose its ideal disjunctive normal form. We can always trans- 
form any formula A into its equivalent formula B by means of the ideal 
disjunctive normal form g, which, as a result of theorem 2, will be 
common for formulas A and B. The chain of transformations which trans- 
forms the formula A into ©, and the chain reducing B to g taken in re- 
verse order fon the strength of theorem 1 such chains exist) consti- 
tute a chain of transformations which transform the formula A into the 
formula B. 

We note that not all the relations (10)-(30) written out above 
from the proof of theorem 1 are used in the transformations (for exam- 
ple, relation (17) is not used). Therefore, if desired the system of 
relations (10)-(30) can be abbreviated such that theorems 2 and 4 will 
be valid as before. 

The second remark concerns the fact that the method of transform- 
ing the formula A into its equivalent formula B by means of the ideal 
disjunctive normal form g common to both of them was necessary only to 
establish the principle of the possibility of conversion from A to B. 
In practice this method usually turns out to be too cumbersome, as a 
result of which we generally look for more direct ways to convert from 
A to B (although, of course, sometimes there may not be a way which is 
significantly shorter to get from A to B than the "roundabout" method 
indicated above). 

An important problem which 1s solvable within the framework of 
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boolean algebra is the problem of the minimization of formulas. The 
sense of this problem is the finding of a general technique (algorithm) 
which makes it possible for any formula of boolean algebra to find its 
equivalent formula having the minimal possible complexity. 

As the criterion of the complexity of a formula it is most natur- 
al to take the number of operations appearing in this formula, so that, 
for example, the complexity of the formula x will be the number 1, 
while the complexity of the formula (xvy) (yV2) will be the number 5 (two 
negations, two disjunctions and one multiplication). However, follow- 
ing the tradition established in the majority of the works on the mini- 
mization problem, we shall make use of a different criterion, taking 
the complexity of a formula to be the total number of letters appear- 
ing in it. Here we are speaking of the number of occurrences of the 
letters (including, possibly, identical letters in this number) and 
not of the number of different letters in the formula. Thus, for in- 
stance, in view of the criterion we have defined, the complexity of 
the formula (xVy\xV¥) should be considered 4, not 2. 

It is not difficult to understand that the set M(A) of the differ- 
ent formulas of boolean algebra whose complexity does not exceed the 
complexity of any fixed formula A will inevitably be finite. Therefore 
the problem formulated above of the minimization of formulas can in 
principle be resolved by the sorting of all the formulas of the set 
M(A) in the order of increasing complexity until a formula is found 
which is equivalent to formula A. However, the algorithm based on this 
sorting is so cumbersome that is is not suitable in practice. 

The problem of the construction of more economical algorithms for 
the minimization of formulas in boolean algebra has not yet been 
solved in the general form. Therefore, in practice we limit ourselves, 
as a rule, to the problem of finding the minimal formula in a particu- 
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lar class of formulas and first of all in the class of all disjunctive 
normal forms. This problem is usually termed the problem of the mini- 
mization of the boolean functions, which, of course, is not entirely 
accurate, since we are not speaking of the minimization of the func- 
tion (which remains unchanged in the minimization process) but of the 
minimization of the formulas which represent the function (in the pres- 
ent case--the disjunctive normal forms). 

All the methods of minimization in the class of the disjunctive 
normal forms are based on the concept of the prime implicant. The im- 
plicant of the boolean function f is the term given to every boolean 
bunction g whose reduction to unity is possible only on those sets of 
values of the variables on which the function f reduces to unity. We 
stipulate that the implicant g covers with its unities some unities of 
the function f. From the properties of the disjunction it follows that 
the disjunction of any (finite) set of implicants &.é....g, of the func- 
tion f will again be its implicant. If in this case the unities of the 
implicants @.@s.....@., considered all together, cover all the unities of 
the function f, then this disjunction simply coincides with the func- 
tion Ff: @VaeV.. Vea =f. 

The reverse is also clear: any term of the disjunction coinciding 
with the function f 1s the implicant of this function f is the impli- 
cant of this function, and the unities of all the terms of the indi- 
cated disjunction all together cover all the unities of the function f. 
In particular, every disjunctive normal form g of the boolean function 
f can be considered as the covering of this function by the set of all 
terms of form g, each of which is the implicant of the function f. In 
this case the elementary products appear in the role of implicants. 

We note that with a reduction of the length of the elementary 
product (as the rusult of dropping part of the cofactors) the number 
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of unities covered by it is increased. The elementary product of maxi- 
mal length (constituent of unity) for n variables reduced to unity on- 
ly at one point, while the elementary product of length n— k reduces to 
unity at ak points. Therefore it is of advantage to cover any given 
function f by elementary products of the minimal possible length, l.e., 
by such elementary products that they themselves are implicants of the 
function f, but none of their internal parts are implicants of this 
function. Such elementary products are customarily termed prime impli- 
cants of the boolean function in question. 

The set of all prime implicants of any boolean function f covers 
all its ones. Therefore the disjunction g of all prime implicants of 
the function f, termed the reduced disjunctive normal form of this 
function. However, this representation will usually not be the most e- 
conomical, since some prime implicants can civer ones whicn are alread- 
y covered by the remaining implicants. Discarding from the form g all 
such redundant implicants, we transform it into the so-called irreduci- 
ble disjunctive normal form of the function f in question. 

A boolean function can have, generally speaking, not one but sev- 
eral Arreducible disjunctive normal forms. For instance, the function 
of the three variables x, y, Z, reducing to zero only on the sets (000) 
and (111) and equal to unity on all the remaining sets, has five dif- 
ferent irreducible disjunctive normal forms. At the same time, we can 
show that any two-place boolean function has a single irreducible dis- 
junctive normal form. 

It is easy to understant that among the irreducible disjunctive 
normal forms of any boolean function f there are inevitably contained 
all its minimal disjunctive normal forms (there may be several of them) , 
i.e., those disjunctive normal forms of the function f which contain 
the smallest number of letters in comparison with all the remaining 
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disjunctive normal forms of this function. 

We can construct sufficiently economical algorithms for finding 
all the prime implicants and all the irreducible disjunctive normal 
forms of any boolean function. However, for separating the minimal 
forms from the number of irreducible disjunctive normal forms there 
is not in the general case any significantly simpler method than the 
method of sequential sorting and comparison of all the irreducible dis- 
junctive normal forms (see Zhuravlev [36]). 

One of the most effective algorithms for finding the prime impli- 
cants and the irreducible disjunctive normal forms is the algorithm 
proposed by Blake [8]. The essence of the Blake algorithm is the fol- 
lowing. It is not difficult to establish that in boolean algebra there 
is satisfied the identity relation of the form 

AB\) AC = AB\ ACV BC. (31) 

If in this relation we consider A to be a letter and B and C to 
be elementary products, then from relation (31) there is derived the 
rule for identity transformation of the disjunctive normal forms which 
makes it possible if they contain two terms of the form xp and xq to 
complement them with a new term (elementary product) pq. It is possi- 
ble, it is true, that this term vanishes or coincides with one of the 
disjunctive forms present in the form already. It is easy to under- 
stand that in view of the ®initeness of the total number of elementary 
products (given variables) new terms will not be obtained by means of 
a finite number of steps of application of the indicated rule. Blake's 
result amounts to the statement that the transformed form of f after 
reaching suitable "stabilization" will contain all the prime impli- 
cants of the boolean function which it represents. 

After obtajning the disjunctive normal form g containing all its 
prime implicants, it is not difficult to free it of all the terms 
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which are not prime implicants. Actually, if any elementary product P 
from g is not a prime implicant, then, being in any case an implicant 
of the function g, 1t includes in itself some prime impiicant p of 
this function and, consequently, can be represented in the f'orm P = pq. 
Since in g there is the disjunctive term p, then it can be used to ex- 
clude from g the term P = pq with the aid of the relation (18):pvpq =p. 
Such an exclusion is usually termed the elementary absorption opera- 
tion. Its application to the disjunctive normal form g provides after 
a finite number of steps the cancellation of all the terms which are 
not prime implicantsand the conversion, thusly, of the form g into the 
simplified disjunctive normal form Eo: 

In order to go from the form Bo to some irreducible disjunctive 
normal form, we can find the redundant terms in Eo bv the same method 
of Blake: if some term in the form gy (or in any other disjunctive nor- 
mal form consisting of prime implicants) can be obtained from the re- 
maining terms with the aid of the application (possibly more than once) 
of relation (18), then this term is redundant and it can be excluded. 

By applying such an exclusion process repeatedly, we reduce the 
form Eo to the irreducible disjunctive normal form Gy: Actually, on 
the strength of the result of Blake presented above, with the aid of 
relation (30) we can obtain from the form g, all the prime implicants 
appearing in So: But further exclusion of terms of the form By ts not 
possible. Actually, if we attempt such an eéxclusion at least one of 
the unities of the function By will be uncovered. It is clear that the 
prime implicant (excluded from g,) covering this unity now cannot be 
recovered from the disjunction of the remaining terms by any elementa- 
ry transformations, in particular with the aid of the application of 
identity (31). 

In order to obtain all the irreducible disjunctive normal forms, 
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the described exclusion method should be applied several times with 
variation of the :rder in which the attempts are made to exclude the 
various terms. As we mentioned above, the finding of the minimal dis- 
jJunctive normal forms requires a complicated operation in the sorting 
of all the irreducible disjunctive normal forms (which can be of va1- 
ted complexity). Therefore in practice the solution of the problem of 
minimization is usually limited to finding some one, randomly selected, 
irreducible disjunctive normal form. 

As an example of the application of the Blake algorithm, we shall 
show the process of minimization of the disjunctive normal form f = xyzV 
VxyV xyz. 

Applying relation (31) to the pairs composed from the first term 
with the second and fron the first term with the third, we reduce the 
given form f to the form f, = xyzVxyvxyzVyzVxz. Application of relation 
(31) to any pair of terms of the form fy does not lead to the appear- 
ance of new terms. Consequently, all the prime implicants of the func- 
tion f (the function represented by the form f) are contained in the 
form fi. 

The application of the operation of elementary absorption to the 
form fy leads to the reduced disjunctive normal form f, = xyVy2Vxz. The 
first term of the form f, can be obtained with the aid of relation (31) 
from the remaining two terms and is thus redundant. Excluding it, we 
come to the form f, = yzvxz, which does not contain redundant terms and 
which is, consequently, the desired irreducible disjunctive normal 
form. In the present case the irreducible disjunctive normal form is 
the only one and as the result of this it coincides with the minimal 
dis junctive normal form. 

More detailed information on the various methods of minimization 
of the formulas of boolean algebra can be obtained in special mono- 
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graphs on the theory of the synthesis of the circuits of discrete auto- 
mata (see, for example, Glushkov [26]). Some additional information on 
this question is presented also in §4 of the present chapter. 

§3. THE CONCEPT OF COMPLETE SETS OF BOOLEAN OPERATIONS 

Theorem 3 of the preceding section shows that for the representa- 
tion of any boolean function in the form of a formula constructed from 
the arguments and the boolean constants O and 1, it is sufficient to 
use in all three types of boolean operations, negation, multiplication 
and disjunction. Every set of boolean operations which possess such a 
property is customarily called a complete set. 

In addition to the set consisting of the operations of negation, 
multiplication and disjunction, we can also construct other complete 
sets of boolean operations. From the de Morgan relations (21) and (22) 
written out in the preceding section, it follows that the disjunction 
operation can be represented by the operations of negation and multi- 
plication, and that the multiplication operation can be represented by 
the operations of negation and disjunction. Therefore a complete set 
of boolean operations can be composed from the negation operation and 
any of the two remaining operations of boolean algebra (multiplication 
or disjunction). 

From the operations of any complete set of boolean operations 
there can be constructed any boolean operations, in particular the op- 
erations of negation, disjunction and multiplication. In order to per- 
form the required construction it is obviously sufficient, using the 
operations of the complete set being considered, to represent the bool- 
ean function which defines the required operation. Conversely, if from 
the operations of some set we can construct the operations of negation 


and multiplication or negation and disjunction, then, in view of what 


we have said above, this is sufficient for the possibility of repre- 
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senting any boolean function and, ~’ 1sequently, for the completeness 
of our set. As a result we come to “1e following proposition. 

Theorem 1. For the completeness 1 any set of boolean operations 
it is necessary and sufficient that with the aid of the operations of 
this set we can construct the function x and one of the functions xy 
or xV¥y. 

Using the criteria of completeness from Theorem 1, we can rela- 
tively easily establish the completeness of many sets of boolean oper- 
ations. One such set is, in particular, the get consisting of tne oper- 
ations of multiplication and addition (modulo two). Actually, it is 
easy to verify that the following relation is valid 

xaextl. (32) 

Thus, negation can be expressed by addition. Since multiplication 
itself appears in the set in question, on the basis of Theorem 1 we ar- 
rive at the conclusion on the completeness of this set. 

With the aid of the operations of addition and multiplication 
there is constructed still another interesting algebra of the boolean 
functions, termed the Zhegalkin algebra. In its general properties (ex- 
pressed by the identity relations) this algebra approaches most close- 
ly the algebra with the conventional addition and multiplication opera- 
tions which is studied in high school. Like conventional addition, mod- 
ulo two addition satisfies the associativity and commutativity rela- 
tions (for boolean multiplication these properties were established in 
the preceding section). The distributive law x(y +z) = xy + xz is al- 
so satisfied, making it possible to remove parentheses in expressions 
just as in conventional algebra. 

After removal of the parentheses, any formula in the Zhegalkin al- 
gebra is represented in the form of the sum of the products of the var- 
iables, including, possibly, the products consisting of a single cofac- 
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tor (single letters) and of a zero cofactor (the constant 1). On the 
basis of the xslation xx = x and the commutativity of multiplication, 
we can consider that in any of the products obtained no letter will oc- 
cur more than one time. 

Identical products, just as in conventional algebra, are consid- 
ered similar terms and are subject to the operation of reduction of 
like terms. The rules for this reduction are different from the corre- 
sponding rujes in conventional algebra, amounting, in the final analy- 
sis, to the easily verifiable identity relation 

x+x=0. (33) 

Thus, any even number of identical addends. mutually cancel, while 
any odd number is equivalent to only a single addend, since the zero 
addends do not alter the values of the sum and can be immediately 
stricken from the sum. 

The reduction of likes terminates our description of the reduc- 
tion process, which we shall call the process of reduction of formulas 
in the Zhegalkin algebra to the canonical form. Let us demonstrate 
this process using an example. Let there be given some formula f = 
= (x + y)(x + z) + y(z + x) of the Zhegalkin algebra. After removal of 
the parentheses, this formula takes the form fy aX + Ry + xe + ye + 
+ yz + xy. After combining like terms, the terms xy and yz, encoun- 
tered twice in the formula, cancel and the formula itself is reduced 
to the final (canonical) form f, = x + xz. 

In view of the completeness of the set of operations of the Zhe- 
galkin algebra and the possibility of reduction of any of its formulas 
to the canonical form, every boolean function f can be represented in 
the Zhegalkin algebra by a formula of canonical form. It is not diffi- 
cult to show that the last formula is determined uniquely by the func- 
tion f with an accuracy to possible permutation of addends and cofac- 
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tors. We shall call this formula the canonical polynomial vf the given 
boolean function f. 

The uniqueness of the determination of the canonical polynomial 
can be established by simple reasoning. Let fi and fs be two different 
canonical polynonials of the boolean function f. Being equal to this 
function, the polynomials fy and fs are equal to one another as func- 
tions (for all values of the variables). In the equality f, = f, iden- 
tical terms in the right and left sides can be mutually cancelled. In 
the right and left sides of the identity relation arising after this 
fi = ts there is not a single pair of identical terms (addends). 

Let us fix one of the addends p which is composed of the smallest 
number of letters in comparison with the remaining addends. Then all 
the remaining addends will differ from the selected addend p by at 
least one letter. Let us fix the set of values of the variables so 
that all the letters appearing in p take the value 1 and all the re-~ 
maining letters take the value O. On the strength of this remark, only 
one of the addends, and precisely the addend p, will become unity with 
the selected set of values of the variables, all the remaining addends 
will be equal to zerc. But then the relation fi = fs is brought to the 
form 1 = 0 (or 0 = 1), which is not possible of the original relation 


foie fs was identical. Thereby we have refuted the assumption made in 


J: 
the beginning of our discussion on the existence of two different (al- 
though equal to one another as functions) canonical polynomials fy and 
fy for the same boolean function f. 

Canonical polynomials which do not contain products of two or 
more variables (i.e., polynomials which are the sum of individual let- 
ters and, possibly, the constant 1, and also the polynomial identical- 
ly equal to zero) are the so-called linear boolean functions. All the 
remaining boolean functions are nonlinear. Corresponding to this divi- 
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sion of the functions, all the boolean operations which they determine 
are also divided into linear and nonlinear operations. 

It is easy to see that with any superpositions (substitutions of 
one in the other) of linear boolean functions the functions resulting 
from the superposition will again be linear. This means, clearly, that 
with the aid of the linear (boolean) operations we cannot construct an- 
y nonlinear operation. This implies that every complete set of boolean 
operations must include at least one nonlinear operation. 

The Operations of negation and addition (modulo two) are linear 
operations, since the canonical polynomials representing their boolean 
functions will be the linear formulas x + 1 and x + y. At the same 
time the functions xy and xVy. and consequently the multiplication and 
disjunction operations which they define, are nonlinear. The first of 
them has as its canonical polynomial the formula xy, and the second-- 
the formula x + y + xy. Both these formulas contain the nonlinear term 
Xy. 

Let us introduce still another division of the boolean functions 
and their corresponding boolean operations into two classes: the class 
of monotone functions (operations) and the class of nonmonotone func- 
tions (operations). To do this let us define for the sets of values of 
the boolean variables the order relation <, assuming that 0< 0, O<l1, 
1 < 1 and that for any two sets of identical length (0) 4, ae ) and 
(B, Bo a's B.) the relation (a), -+-) < (6,6, +.B )is valid when 
and only when for all i=l, 2,...,n a < B,- If these sets are dif- 
ferent, then we shall say that the first of them is the smaller and 
the second is the larger. We note that certain sets, for example the 
sets (01) and (10), will in this case be incomparable with one another, 
since the definition presented does not make it possible to consider 
that one of them is larger or smaller than the other. 
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The boolean function f is termed monotonic if with transition 
from any smaller set A of values of its variables to any larger (in 
comparison with A) set B the value of the function cannot diminish, 
i.e., transition from the value 1 on the set A to the value O on the 
set B. If, however, even for one pair of the sets A, B such that A < B, 
f(A) = 1, and f(B) = 0, then the function f is termed a nonmonotonic 
boolean function. The boolean operations determined by these functions 
are correspondingly divided into monotonic and nonmonotonic. 

For any superpositions (substitutions of the function into func- 
tion) of the monotone boolean functions we again obtain monotone func- 
tions. Actually, if the functions f(y; Yous sees Yn) and 9(xX,5 oe 
x,) are monotone, and the function » is substituted, say, in place of 
the variable Vy then, on the strength of the nomotonicity of the func- 
tion » with any increase of the set of values of the variables Yo» Y3s 
coe Vins Xzs Xos sees Xs the set of values of the variables q, Yo» 35 
soe Vn will either increase or remain unchanged. In both cases the val- 
ue of the complex function f(9, Yos V3 Tr Yin) in view of the mono- 
tonicy of the function f(yy; Vos eres Vn)» cannot diminish, which 
proves its monotonicity. 

With transfer over to operations, the fact just established means 
that with the aid of only the monotone boolean operations we cannot 
construct any nonmonotone boolean operation (for example, negation), 
which implies that every complete set of boolean operations must of 
necessity include at least one nonmonotone operation. 

The simplest example of nonmonotone operation is that of negation. 
It is found also, that in a certain sense every nonmonotone operation 
includes the negation operation. More precisely, the following proposi- 
tion is valid. 

Theorem 2. The negation operation can be constructed with the aid 
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of any nonmonotone boolean operation. 
Let us consider an arbitrary nonmonotone boolean operation. It 


in 


is defined by some nonmonotone boolean function f(x, oe x). 


view of the nonmonotonicity of the function f, there are two sets A 

and B of values of its variables such that A is smaller than B, and 

the function f takes on the value 1 on the set A and the value O on 

the set B. The set A differs from set B in that in certain of the loca- 
tions where in the set B there stand ones, in the set A there stand ze- 
ros. Replacing sequentially, one by one, these zeros by ones, sooner 

or later we arrive from the set A, where f(A) = 1, at set B, where 

f(B) = 0. Consequently, in one of the sequential replacements of zero 
by one the value of the function f must change from 1 to 0. This means 
that for some i (1 <i <n) f(o,, sy veers % a4 Or WG as cee a) a 
and f(a, Pe ee a) = 0, where O, %, ---5 4% 45 Yay 
re oe are certain boolean constants (0 or 1). But them, as it is eas, 
to see, the boolean function f(%,, Wy sees Gas Xs Says cers wo) 
of the one variable x can be nothing other than the negation of this 
variable, i.e., x. Interpreting the function f as a boolean operator, 
we obtain the required representation with the aid of this negation op- 
eration. 

In the classification of the boolean operations we shall exclude 
from consideration the zero-place operations (constants O and 1), and 
also the trivial single-place operation which repeats the values of 
its argument. It is also natural to not differentiate between the oper- 
ations which arise from the same boolean function with various permuta- 
tions of its arguments. Taking account of this, we shall have a single 
one-place operation, negation x, and eight two-place operations, multi- 
plication xy, disjunction xvy, addition x + y, the equivalence opera- 
tion x~y, the implication x>y, the inhibit operation xy, the Sheffer 
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operation xvjy and the Pierce operation xy. 

It is easy to verify that only two of all the listed nine boolean 
operations are monotone: multiplication and disjunction. Only three op- 
erations are linear; negation, addition and equivalence. Thus, we ar- 
rive at the following result. 

Theorem 3. Among the nine single-place and two-place boolean oper- 
ations, those of multiplication and disjunction are nonlinear (but mon- 
otone), those of negation, addition and equivalence are nonmonotone 
(but linear). The remaining four operations, inhibit, implication, 
Sheffer and Pierce, are both nonlinear and nonmonotone. 

It is not difficult to derive the following important result from 
Theorems 2 and 3. 

Theorem 4. With the use of any nonlinear operation there can be 
obtained either the multiplication or disjunction operation. 

Let us consider the arbitrary nonlinear operation defined by the 
nonlinear boolean function f(x), Xos sees x) The canonical polynomi ~ 
al of this function contains at least one product with two or more co- 
factors. Let us separate among all such products one of those which 
have the smallest length 1. This product contains no less than two co- 
factors and, consequently, has the form X4X5P, where p is the product 
of some set of letters (possibly empty or containing only a single let- 
ter). Keeping the letters X; and x4 unchanged, we replace all the let- 
ters occurring in p by ones, and all the remaining letters (from the 
number of letters X19 Xo» shay x) by zeros. After this substitution 
the product which we separated out becomes X5X, and ali the remaining 
products of length greater than one become zero, since each of them 
contains at least one letter different from X43 x; and from the let- 
ters occurring in p. 

After this substitution we obtain the boolean function of two var- 
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jables o(xX,5 x4), whose canonical polynomial has the form xy, + aK, + 
+ BX , + vy, where a, B, y are the boolean constants (0 or 1). If this 
function is equal to XX 4 Or x, Vx,= xix, +X, +x, then the theorem is 
proved. Otherwise, being nonlinear, this function, on the strength of 
Theorem 3, will inevitably be nonmonotone. But them, by Theorem 2, 
with the aid of the boolean operation defined by the function @ we can 
; express the negation x = x + 1. Having available the functions Xs» Xy 
Xx, + l, X 4 + 1 we can in the function 9 replace x, by X, + B, and x, 
by x, +a, after which we obtain the function V(x, 5 x4) with the -anon- 
ical polynomial (x, + B) (x, + a) + a(x, + B) + B(x, +a)+ye x,xX5 + 
+ OX, + BX + ap + Ox, + aB + Px, +aB+7&= X4X4 + 6, where the lect- 
ter 6 designates the boolean constant aB + y. If 6 = 0, then w(x,, x.) = 


= XyX55 and the theorem is proved. If, however, 6 = 1, then W(x,x5) = 


ul 


XiX4 + 1 = XX. Since we have already constructed the negation, 
from the last function it is again easy to obtain the product Xi X4- 

Thus, in all cases we can with the aid of the given operation con- 
struct expressions for the function xy or-xVy, and consequently, also 
for the operations defined by them, Q.E.D. Now it is easy to derive 
the following condition of completeness for the sets of boolean cpera- 
tions. 

Theorem 5. In order that a set of boolean operations by complcte, 
it is necessary and sufficient that at least one nonlinear operation 
and at least one nonmonotone operation be included in the composition 
of this set. 

The necessity of this condition was established above, and the 
sufficiency is a direct result of Theorems 2 and 4. 

We agree to call the complete set of boolean operations irreduci- 
ble if from it we cannot exclude a single operation without the set 
losing its property of completeness. Theorem 5 makes it easy to list 
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all the irreducible complete sets composed fram single-place and two- 
place boolean operations. These are four sets, each of which consists 
of a single operation (implication, inhibit, Sheffer operation and 
Pierce operation), and six complete irreducible sets consisting of two 
operations: combination of the operation of multiplication with each 

of the operations of negation, addition or equivalence, and combina-. 
tions of the operation of disjunction with each of the same three oper- 
ations. 

The concept of completeness which we have used has made it possi- 
ble in the construction of the boolean functions to use not only the 
arguments of these functions and the operations from the corresponding 
complete set, but also the boolean constants O and 1. If we exclude 
the possibility of using the constants, then there arises a new con- 
cept of completeness which we shall term strong completeness or com- 
pleteness in the strong sense. 

By no means all the complete sets of the boolean operations satis- 
fy the condition of strong completeness. For example, the set consist- 
ing of the operations of addition and multiplication, being complete, 
nevertheless is not complete in the strong sense. It is easy to see 
that without the use of the constant 1 all the boolean functions con- 
structed with the aid of this set of operations vanish at the point at 
which all their arguments take on zero values. Thus, with the use of 
only the operations of addition and multiplication (without the con- 
stant 1) there cannot be represented a whole series of boolean func- 
tions, for example the function x or the function identically equal to 
unity. 

At the same time, the sets composed from the operations of nega- 
tion and muitiplication or negation and disjunction are complete not 
only in the conventional sense but also in the strong sense. In order 
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to convince ourseives of this it is sufficient, obviously, to prove 
the possibility of representing the constants O and 1 with the aid of 
the operations indicated. This is done by the formulas 0 =x, | = xx, 

l= x\VX, O=xV 2. 

The necessary and sufficient conditions for strong completeness 
for the sets of boolean operations were found by Post [62]. In order 
to formulate these conditions, it is necessary to become acquainted 
with three new remarkable classes of boolean functions and the opera- 
tions which they define. 

The boolean function (operation) f(x,, Kos sees x,) is termed a 
zero-preserving function (operation) if r(0, 0, ..., 0) a unity- 
preserving function (operation) if f(1, 1, ..., 1) = 1 and a self-dual 
function (operation) if f(x,, Koy sees x) = f(x,, Xos haan Rays The 
result of Post mentioned above can now be formulated as follows. 

Theorem 6. In order that a set of boolean operations be complicte 
in the strong sense it is necessary and sufficient that this set in- 
clude in itself at least one nonlinear operation, at least one non- 
monotone operation, at least one non-zero-preserving operation, at 
least one non-unity-preserving operation, and at least one operation 
which is not self-dual. 

The necessity of the conditions formulated in Theorem 6 is proved 
by exactly the same method as in the case of the conventional completc- 
ness: it is necessary to convince ourselves only (and this is not dif- 
ficult to do) that without using the constants, with the aid of the 
zero-preserving operations we can construct only those boolean func- 
tions (and this means the boolean operations as well) which also pre- 
serve zero. The situation will be the same with the operations which 
preserve unity and with the self-dual operations. Proof of the sufti- 
ciency reduces to establishment of the possibility of constructicn of 
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the constants O and 1 and the subsequent application of Theorem 5. The 
details of this proof can be found in the article of Yablonskiiy [83] 
(see also Glushkov [26]). 

Of the nine single-place and two-place boolean operations listed 
above, six operations are not zero-preserving: negation, the equiva- 
lence operation, implication, and also the Sheffer and Pierce opera- 
tions. 

The list of operations which are not unity-preserving also in- 
cludes six operations: negation, addition, the inhibit, Sheffer and 
Pierce operations. 


Finally, all the operations other than negation are not self-dual: 


multiplication, disjunction, addition, implication, and also the opera- 


sions of equivalence, inhibit, Sheffer and Pierce. 
“he Sheffer and Pierce operations possess the most remarkable 


property: each of them, considered individually, is a complete, in the 


strong sense, set of boolean operations. These sets, of course, are ir- 


reducible in the sense that from them we cannot remove a single opera- 
tion without the set losing the property of strong completeness. 

It is easy to show that every operation which is not zero-preserv- 
ing is either also not unity-=preserving or is not a self-dual opera- 
tion. This implies that in any irreducible strong complete set of bool- 
ean operations there cannot be more than four (and not five, as it 
might seem a priori) different operations, and irreducible strong com- 
plete sets consisting of four different boolean operations actually do 


exist. 


§4. APPLICATION OF BOOLEAN ALGEBRA IN THE THEORY OF COMBINATION CIR- 
CUITS 


Combination circuits are the simplest technical devices for the 


conversion of discrete information. Let us assume that we have at our 
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dispes.1 a finite number of types of signals of a particular neture 
(mechanical, electrical, optical, etc.) composing the so-called signal 
S alphabet. We shall use the term combination circuit for any device P 
which realizes some alphabetic operator A = A(S) in the alphabet S and 
satisfies the following conditions. 

1. The domain of definition of the operator A is the set of words 
in the alphabet S having the fixed length m > 1 (depending on the 
choice of the device P). 

2. All the input words from the domian of definition of the opera- 
tor A are transformed by the circuit P into output words of the same 
length n> 1 (also depending on the choice of P). 

3. All the letters (signals) composing the input word are applied 
Simultaneously to the m points of the circuit P which are called its 
input poles, and at the same time, also simultaneously, all the let- 
ters (signals) of the corresponding output word appear at another n 
points of the circuit which are called its output poles. The input 
and output poles are numbered in a strictly fixed method and are asso- 
ciated with the corresponding locations of the input and output words, 
so that the i-th input pole (i =1, 2, ..., m) and the j-th letter of 
the output word appears at the j-th output pole j=1, 2,..., n. 

Of course, every real technical device has some internal delay, 
so that the condition of simultaneity of the appearance of the input 
and output signals in the combination system is not to be understood 
too literally. We are considering some abstraction of the actually en- 
countered case in which che indicated delay can be neglected in compar- 
ison with the interval of discrete operation of the circuit, deter- 
mined by the time for the replacement of one input word by another. 

In practice, the combination circuits are usually characterized 


by the absence of memory in them. This means that the output word ap- 
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pears at the output poles of the circuit only for that time while the 
corresponding input word is applied to the input poles. After the ap- 
plication of the input signals has been terminated, the circuit "for- 
gets" these signals, so they cannot affect the process of the forma- 
tion of the response of the circuit to the following combination of 
signals applied to its input poles. 

Conditions 1 and 2 impose, at first glance, very strong limita- 
tions on the alphabetic operators which can be realized by the combina- 
tion circuits. In actuality, however, words of the same length (select- 
ed each time in accordance with the specific conditions) can be used 
to code any finite ensemble of words. 

Thus, with suitable coding the combination circuits can realize 
any alphabetic operators with finite domains of definition. 

The simplest technique for equalizing the lengths of any fixed 
set of words by coding consists in the suffixing (repeatedly, general- 
ly speaking) to the words of lesser length an empty word which is spe- 
cially introduced into the alphabet for the purpose of bringing the 
number of letters composing these words up to the number of letters 
composing the longest word of the set in question. Of course, other 
techniques of resolving this problem are possible. 

We note also that, with suitable treatment of the operation of 
the combination circuits, we can consider that the same combination 
circuit is capable of realizing not one, but any finite set of alpha- 
betic operators. To accomplish this it is sufficient to separate all 
the input poles of the circuit into the so-called information and con- 
trol poles. If we consider the transformed input word to be only that 
combination of input signals which is applied to the information poles, 
then by fixing various control words (1.e., words applied to the con- 
trol poles) we will obtain different alphabetic operators which associ- 
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ate the output words of the circuit to the information input words. 

The technique for the variation of the alphabetic operators real- 
ized by the combination circuit with the use of the control words is 
completely analogous to the technique described in the first chapter 
for the organization of the operation of the universal algorithm: to 
the input of the universal algorithm there is applied not only the in- 
formation word to be transformed but also the control word, for which 
we select the representation of the specific algorithm which is to be 
realized. 

For technical reasons it is simpler and more convenient to select 
the binary alphabet as the signal alphabet. In this case two types of 
signals are usually identified with the boolean constants O and 1. We 
shall term the combination circuits with such a signal alphabet binary, 
or boolean, combination circuits. 

In the binary combination circuit each output signal is some bool- 
ean function of the input signals of the circuit. If the circuit has m 
input and n output poles, then the alphabetic operator realized by it 
is completely characterized by the system of n boolean functions of m 
variables which give the output signals on each of the n output poles 
as a function of the signals on the m input poles. We shall term this 
system of functions the output functions of the circuit in question, 
and the circuit itself will be termed a boolean (m, n)-terminal net- 

The results of the preceding section lay the theoretical base for 
one of the primary problems of the theory of boolean (m, n)-terminal 
networks--the problem of their synthesis. The essence of the problem 
of the synthesis of combination circuits in general and of boolean (m, 
n)-terminal networks in particular amounts to the development of the 
methods which make it possible to construct circuits which are as com- 
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plex as desired from a fixed (usually quite small) number of types of 
elementary combination circuits, which in the case of the binary cir- 
cults are called logic elements. 

Any boolean (m, 1)-terminal network can be selected as a logic el- 
ement. In view of what we have said above, its operation can be charac- 
terized by the output function f(x), Xo» coe x)s which 1s a boolean 
function of m variables which gives the relationship of the single out- 
put signal of the element we have selected as a function of the ensem- 
ble of all its input signals. We say that the selected logic element 
realizes this boolean function or, correspondingly, realizes the bool- 
ean operation defined by this function. 

Let us assume now that some set of logic elements has been select- 
ed. Th synthesis of the combination circuit from the elements of the 
selected set amounts to the sequential connection of the output pales 
of some elements to the input poles of other elements in such a way 
that several output poles are not connected to the same input pole, 
and so that closed circuits are not formed along which a signal emerg- 
ing from some element Q and passing, possibly, through other elements 
again can arrive at one of the input poles of the same element Q. Here 
we shall assume that we have at our disposal an unlimited number of 
copies of any element of the selected set so that there will be no 
shortage in quantity (but not number of types ) of logic elements at 
any time. 

After completing the described process of the connection of the 
output poles of some elements to the input poles of others, some set M 
of input poles and some set N of output poles are free of any connec- 
tions with other poles. It is natural now to take the set M as the set 
of input poles and the set N as the set of output poles of the complex 
circuit constructed as a result of the described process. 
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If in the process of the connection of the poles we have observed 
the limitations presented above, then the circuit constructed will 
give an output signal on each of the n poles of the set N as a com- 
pletely determined boolean function of the signals on all m poles of 
the set M. Therefore we can consider it as a combination circuit in 
the binary alphabet or, more precisely, as a boolean (m, n)-terminal 
network. 

It is easy to understand that the set N of output poles of the 
circuit can be complemented by the poles which have been subjected to 
connection, which we shall term the internal nodes of the circuit. 
With use of several types of specific physical realizations of the bi- 
nary signals, we can connect several output poles to the same input 
pole. Ambiguity does not arise as result of the arrival of several slg- 
nals at the same pole in view of the existence of the so-called natur- 
al separation of signals. Natural separation amounts to the fact that 
a zero signal is formed on a particular pole when and only when all 
the signals arriving simultaneously at this pole are equal to zero. If, 
however, even one of the arriving signals is equal to one, then the 
combined signal is equal to one. In this case the input signals of the 
circuit czn, evidently, also be applied to certain of its internal 
nodes, as the result of which they are included in the set M of input 
poles of the circuit. 

If the synthesized circuit has a single output pole and is thus 
characterized by a single output boolean function f(x); Kos sees x) 
the described process of construction of the circuit by the method of 
sequential connection of the nodes in essence repeats the process of 


the sequential construction of the formula representing the func- 


tion £(xX), X55 e+e Xn) with the aid of the operations which are rcal- 
ized by the logic elements whicn we have used. The synthesis of the ar- 
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bitrary boolean (m,1)-terminal network is possible if the set of indica- 
ted operations is strongly complete.Since every (m,1)-terminal network 
can be made up of n individual (m,1)-terminal networks, then in the case 
of satisfaction of the condition of strong completeness we obtain the 
possibility of constructing arbitrary binary combination circuits. 

In practice, however, it is found as a rule that it is not diffi- 
cult to apply to the synthesized circuit signals which are identically 
(at all instants of time) equal to zero and one using channels special- 
ly assigned for this purpose. Moreover, for the zero signal we fre- 
quently do not need any special channel, since with several physical 
realizations of the signals a zero signal appears on each isolated, 
i.e., not connected to anywhere, input pole. In this case the condi- 
tion for the possibility of the synthesis of an arbitrary binary combi- 
nation circuit is now not strong, but rather ordinary completeness of 
the set of operations which are realized by the selected logic ele- 
ments. In this case, for brevity we speak of the completeness or incom- 
pleteness of the set of logic elements themselves, rather than the 
boolean operations realized by them. 

Among the logic elements which are most frequently used in prac- 
tice there are the so-called AND and OR elements which realize respec- 
tively the boolean operations of multiplication and disjunction. As a 
rule, along with the two-input AND and OR elements which realize the 
functions xy and xVy, wide use is made of the multi-input variants of 
these elements which realize the boolean functions X, Xp vee X, and 
XV x2... V Xn 

The boolean (1,1)-terminal network which performs the negation op- 
eration also frequently figures among the logic elements under the 
name of invertor. In the realization of the signals O and 1 in the so- 
called potential circuits using two different levels of electrical po- 
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tential, the AND and OR circuits can be constructed with the aid of re- 
sistors and semiconductor diodes, and the inverter with the use of re- 
sistors and semiconductor triodes (transistors). 

When we use two-input AND/OR/INVERT elements as the sct of logic 
elements, the problem of the synthesis of the boolean (m, 1)-terminal 
networks reduces to the problem of the construction of the formulas of 
boolean algebra which represent the output functions of these (m, 1)- 
terminal networks. The interest is not in the construction of some cir- 
cuit with the given output function (in view of what we have just said 
this is not difficult), but rather the construction of an adequately 
economical system which uses the smallest possible number of logic cle- 
ments. In this case the problem of the construction of economical cir- 
cuits reduces to the problem of the minimization of the formulas of 
boolean algebra. 

Quite frequently in practice, in the construction of a particular 
combination circuit we have the possibility of applying to its input 
poles not only the input signals of interest to us X19 Xo sees Xypp 
but also their negations X,, Xo, sony Xe In this case it is clearly 
sufficient for the synthesis of the circuit to have only AND and OR 
elements, and the problem of construction of sufficiently economical 
circuits is usually solved only in the class of the so-called two- 
stage circuits, i.e., circuits in which all AND elements precede the 
OR elements or, conversely, all OR elements precede all AND clements. 
Such circuits are obviously described by disjunctive or conjunctive 
normal forms, for which the minimization methods were discussed in §2 
of the present chapter. 

As an example of the synthesis of the two-stage combination cir- 
cuit let us consider the synthesis of the boolean (6,1)-terminal net- 
work with the output function f = xyzvxyvxvzvyzv2z,. assuming that to the 
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Six input poles of our circuit there are applied the signals x, y, 2, 
X, Y, Z» and as logic elements we select the two-input AND and OR ele- 
ments. 

If we design the circuit in strict accordance with the originally 
given formula representing the function f, then the circuit will con- 


tain 7 AND and 4 OR elements. If, however, we minimize this formula us- 


i—@ ing the Blake method as was done (precisely for 
: c) this formula) at the end of §2, then it is 
,--@ ()—*} : 

Y (C) found that the given function f can be repre- 

z 

z sented by a far simpler formula: f=yz\vxz. The 


Pig. “5. circuit corresponding to this formula contains 
in all two AND and one OR element. Representing 

AND and OR elements with circles having the letters C and P inside, we 
can represent the constructed circuit visually (Fig. 5). In the con- 
structed circuit the input poles to which the signals x and y are ap- 
plied are actually not used. Therefore the given output function can 
be realized by a boolean (4,1) -terminal network rather than by the 
(6,1)-terminal network assumed initially. 

In this example the output function of the circuit to be synthe- 
sized was given in the form of some formula of boolean algebra so that 
the synthesis process reduced in essence only to the simplification of 
this formula. In practice we encounter most frequently the case when 
the output functions of the circuit to be synthesized are given by ta- 
bles of their values. In this case the first stage of the process of 
circuit synthesis is the finding of some (not necessarily the most sim- 
ple) formulas which represent the given functions. A universal tech- 
nique for such construction is the method based on the use of the 
ideal disjunctive normal forms (see §2): any boolean function can be 
represented in the form of the disjunction of the constituents of uni- 
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ty co onding to those sets of values of the variables on which 
this .ction becomes unity. The representation obtained is then sub- 
jected to minimization. 

In the case when the number of variables is relatively small, it 
is convenient to search for the irreducible and even the minimal dis- 
Junctive normal forms representing the given functions directly from 
the tables of the values of these functions. To facilitate this search 
use 1s made of special forms of writing of these tables in the form of 
the so-called Karnaugh maps (Veitch diagrams). 

The Karnaugh map is a table with four rows designated by the var- 
ious sets of values of the first two variables x and y and with 4 col- 
umns designated by the varivus sets of values of the last two varibles 
Z, us. The map field (for the case of four variables) is thus divided 
into 16 squares which ar numbered sequentially by numbers from 0 to 15 
inclusive. The Karnaugh map for four variables: 

In using this map to specify a particular 


7 boolean function f(x, y, z, u), in each square 


2 1 {11 | 10 
ici Ais Easel eaten al Ic a there is written the value of this function 
1113/2 
Seles pale (O and 1) on that set of values of the variables 





Ce whose number coincides with the number of the 





Meg eer ered given square (the first two elements of the set 

under discussion here designate the row and the 
second two elements the column, at the intersection of which the square 
in question is located). 

With this formulation in the case of the partial boolean map: de- 
signated with zero, designated with unity, and not designated at all. 
The last squares correspond to those sets on which the values of the 
function in question are not defined. In the case of the everywhere- 


given boolean functions, in all the squares there will be written either 


atom 


zero or unity, therefore these functions can be specified by the in— 
Gication only of those squares in which there will be written ones, 
or, as we shall say, by indicating the ones configuration of the func- 
tion in question. 

The Karnaugh map is constructed so that the ones configurations 
which give the various elementary products are recognized very simply. 
For the case considered of the four variables, the ones configurations 
of the elementary products of length 4 (constituents of unity) reduce 
to separate, or as we shall say here, to elementary Karnaugh maps. 

The corresponding configurations which give all op 2 = 32 ele- 
mentary products of length 3 are all possible pairs of elementary small 
squares standing in a row and thus composing a rectangle with dimensions 
2x 1. It is only necessary to mentally identify the opposite edges of 
the Karnaugh map — the upper with the lower and the left with the right. 
As the result of this identification it is necessary to consider, for 
example, that the elementary small squares with numbers 4 and 6 or with 
numbers O and 8 stand in a line, while the elementary small squares 
5 and 9 or 7 and 2 must not be considered as standing in a line. 

Similarly the ones configurations which give all of 28 = 24 ele- 
mentary products of length 2 are all possible combinations of four 
elementary small squares forming (4 x1) - rectangles and (2 x 2) - 
squares, and for all Cy 02 = 8 elementary products of length 1 the cor- 
responding representations are given by all possible combinations of 
elementary small squares in (4 x 2) - rectangles. Here we must not for- 
get the identification of the opposite edges of the Karnaugh map. 

The elementary product corresponding to any of the ones configura- 
tions listed above is easily found, since such a product is composed 
of all three and only the three cofactors (x, xX, y, y, Zz, ZU, U), 
which become unity on all the sets of values of the variables covered 
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by the given configuration. 

Using this rule it is easy to find, for example, that to the con- 
figuration consisting of the elementary product xyz and to the config- 
uration ((2 x 2) -square!) consisting of the elementary squares 0, 2, 
8, 10 there corresponds the elementary product yu. 

When the boolean function f is given by the Karnaugh map, finding 
the irreducible and minimal disjunctive normal forms which represent 
this function reduces to finding the most economical coverings of the 
ones configuration which gives the function f using the ones config- 
urations described above which correspond to the elementary products 
of different length (see §2). 

Let us consider as an example the problem of finding such a mini- 
mal covering for the boolean function f given by the Karnaugh map 

It is assumed that in the squares in which 
there are dashes the values of the function f can 
be arbitrary, so that if in the formation of a 
particular desired configuration it is necessary 


to place a one in a particular one of these 





squares this can always be done. 

It is easy to see that all the (4 x 2) - 
rectangles which can be constructed on the given map include at least 
one zero of the function f. This means that among the elementary pro- 
ducts of length 1 there is no implicant of the function in question. 
There are two (2 x 2)-squares which do not contain zeros of the func- 
tion f: the "square" consisting of the four corner elements of the 
elementary squares and the "square" standing in the right lower corner 
of the map (it contains three ones and one dash). Together, these 
squares cover all the ones of the function f and therefore this func- 
tion (with an accuracy to the indifferent values designated by the 
- lel - 
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dashes ) can be represented in the form of the disjunction of the corre- 
sponding elementary products f=yuVx2. 

The disjunctive normal form found is, as it is easy to see, mini- 
mal for the function f with any possible interpretations of the indif- 
rerent values designated by the dashes. 

The described technique for finding directly the minimal dis junc- 
tive forms is applicabie not only for the boolean functions of four 
variables, but also for functions of a smaller number of variables. 

The Karnaugh maps of general form for functions of three and two vari- 
ables: 

In using the first table it is neces- 
sary to mentally identify the upper and 
lower edges, so that the elementary squares 
with numbers 0; 4 and 1; 5 are to be con- 
sidered neighboring. 

With the aid of certain additional 





tricks we can construct Karnaugh maps for 

5 and 6 variables. For a larger number of 
variables in the general case, the problem of finding the minimal re- 
presentations directly from the tables of the boolean functions be- 
comes so cumbersome that the corresponding Karnaugh maps are of little 
assistance. In these cases we must resort to the analytic methods for 
minimization of the formulas of the type of the Blake method and other 
similar methods. 

The finding of the mininal disjunctive normal forms for the out- 
put functions of the boolean multi-terminal networks is extremely use- 
ful, not only for the synthesis of the two-stage circuits using AND 
and OR elements described above, but also for the synthesis of circuits 
using gate elements, usually termed simply gates. 
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Gate is the name given to the binary combination circuit with 
two input poles and one output pole. The operation of the gate amounts 
to the fact that it passes or does not pass to its output pole a sige 
nal applied at one of its input poles (termed gated pole) depending on 
whether there is applied to the second of the input poles (termed the 
control pole) a signal equal to one or, correspondingly, a signal equal 
to zero, 

In the gate circuits (1.e., in circuits composed of gates) sig- 
nals are applied to the control input poles of all input poles of all 
the gates which are equal to some initial variables x, y, Zz, ... and 
their negations xX, Y, Z, ... . In addition, there is still another in- 
put pole of the circuit to which there is applied the gating input 
signal, identically equal to one. For the gating signals the property 
of natural separation (see above) is satisfied, which ensures with the 
application of several gating signals to the same pole taht the sig- 
nal appearing on this pole will be equal to the disjunction of all 
these signals. The output signals of the gating circuits are also sig- 
nals of the gating type. 

The gating circuit for the case of a single output pole can be 
completely constructed using any formula which represents the output 
function of the circuit with the aid of the operations of multiplica- 
tion and disjunction applied to the input variables and their negations 
(an example of such a formula might be any disjunctive normal form). 
With this construction, to every multiplication there corresponds a se- 
ries connection, and to every disjunction there corresponds a parallel 
connection of gates or gate circuits composed of several gates. 

If we designated a gate with a circle with the letter B inside, 
then a gate circuit composed in accordance with the formula f = (xViu)z xy, 
will have the form shown in Fig. 6. At the internal node of the cir- 

OS 


cuit designated by the letter A there is generated the gating signal 

xvy (result of the parallel connection of the gates with the con- 
trol signals x and y). At the node B there is generated the gating 
signal x and at the node C the gating signal xy (result of series con- 
nection of gates with the control signals x and y) Finally, the out- 
put signal fo the entire circuit as a whole (at pole D) is the result 
of the parallel connection of two gate networks with the output (gated) 
signals (xvyz and xy. 

The gate circuits include the so- 

called relay-—contact circuits which are 


constructed using electromagnetic relays. 





Gates of this type (relay contact) have 
Fig. 6, two-way conductivity, transmitting the 
gated signals not only in the forward di- 
rection (from the gate input pole to the output pole) but also in the 
opposite direction. This situation gives rise to additional diffi- 
culties in the construction of the theory of the relay-contact circuits 
(associated with the existence of the so-called 
bridge circuits and the appearance of paths for sig- 
nal transmission which were not initially planned). 


Such difficulties do not usually arise in the case 





of the electronic gates which do not have two-way 


Fig. 7. 


conductivity. 
In the design of gate circuits using gates of all types the so- 
called cascade method (see [65]) can be of considerable assistance. 
This method is based on the use of the relation, valid for any boolean 


function f, 
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The validity of this formula is easy to see by setting » 1 and 
xX, = 0 in it. 

In application to the gate circuits, and also to the circuits con- 
structed using the AND and OR elements, formula (34) reduces the pro- 
blem of the synthesis of the circuit with the n-place out function 
f(x,, Kor sees X,) to the problem of the synthesis of the circuit with 
two (n — 1)-place output functions = fh(x,, x) 0... tat) =f (Xn Xe Xn, 1) and 

blir Key sory Rent} ef (Xp, Xqr oon, Maar, OP 

The cascade of gate circuits realizing this reduction is shown in 
Fig. 7. With several output functions the circuit of our (n-th) cas- 
cade becomes complicated, however the reduction process it self re- 
mains essentially the same. Continuing the reduction process, we final- 
ly construct the required gate circuit, composed in the general case 
(for 1. variables) of n cascades. 

The application of a synthesis method analogous to the described 
cascade method permitted Shannon [80] to establish the following esti- 
mate of the number of gates (of any type) required for the realization 
(in the form of the output function of some gate circuit) of the ar- 
bitrary boolean function of n arguments. 

Theorem_1. For any real positive number ¢ there exists the whole 
number N = N(e) such that any boolean function of n > N varialbes can 
be realized in the form of the output function of a gate circuit con- 
taining no more than (+e gates. For a similar realization with 
nts 


any n no more than gates are required. 


Similar complexity estimates of circuits, but for general assump- 
tions relative to the sets of logic elements used, were established 
by Lupanov (see [51], for example). It has also been shown that there 


exist boolean functions which cannot be realized by less than 


Qn+2 
n 





(1—e,) gates, where the quantity e€, in this case tends to zero 
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with unlimited increase of n. 

It is of interest to generalize the results presented to the case 
of the arbitrary boolean (m,n)-terminal networks constructed with the 
use of any two-input logic elements. Since every boolean (m, n)=- termi- 
nal network realizes some alphabetic operator A with a finite domain 
of definition, the minimal possible complexity L(A) of the boolean 
(m, n)-terminal networks realizing the given operator A can be taken 
as the natural quantitative complexity estimate of the operator A it- 
self. Here, in view of the absence of adequately substantiated reusons 
to give preference to a particular two-input logic element, it is 
clearly most natural in the construction of the indicated boolean (mn, 
n)-terminal networks to make use of all the types of two-input logic 
elements, considering the circuit complexity to be the total number of 
logic elements composing it. 

The described method is not directly suitable for the estimation 
of the complexity of alphabetic operators with infinite domains of 
definition. If, however, we are required to obtain not the absolute 
estimate, but only a relative practical estimate of the complexity of 
several alphabetic operators, we can first finitize (make finite) 
their domains of definition, discarding all the input words whose 1l 
lengths exceed some number N. This number must be selected so that the 


probability of encountering in the practical application of the ope- 


ratcrs in question input words longer than N will be sufficiently 


small. 

If for all n = 1,2,... there are given the probabilities p(n) of 
the occurrence of input words of length n, then we can also proceed as 
follows: the given alphabetic operator A is divided into the operators 


A A so that the operator AL has as its domain of defination the 


1? Oreo < 
set of all words from the domain of definition of the operator A whose 
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length is equal to n and it acts on these words just like the operator 
A(n = 1, 2,...). Let L(n) be the complexity of the operator A, com= 
puted by the method described above. Then the complexity of the orig- 
inal alphabetic operator is quite naturally the infinite sum E Linyein) 

More rational estimates of the alphabetic operators can Be ob- 
tained by using discrete automata with memory for the representation 
of the operators in place of the combination circuits. The fundamen- 
tals of the theory of such automata are considered in the following 
(third) chapter of the present book. 

§5. THE CONCEPT OF PROPOSITIONAL CALCULUS 

| Propositional calculus is the initial and simplest portion of 
mathematical logic. The primary problem which mathematical logic poses 
to itself is the formalization of the complex thought processes which 
go to make up so-called logical thought. This formalization is achieved 
by use of the construction of logical calculus. 

Every logical calculus includes in itself first of all some means 
for the formalization of the writing of various sorts of statements 
about which there is reason to say that they are true of false. It is 
customary in mathematical logic to call this sort of statement a pro- 
position. The formalization, which is what we are considering here, 
amounts to the introduction of a rigorously defined system af symbols 
for the designation of various sorts of operations which make it pos- 
sible to construct more complex propositions from simpler propositions. 


As a result of the formalization, we have the possibility of writing 
propositions in the form of formulas constructed from the symbols in- 


troduced by the use of definite rules. 

In spite of the great importance of the formalization of the writ- 
ing of the propositions, formalization in itself does not constitute 
the calculus. For the construction of a particular logical calculus it 
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is necessary also to define certain formulas and operations on for- 
mulas, termed axioms and derivation rules of the corresponding calcu- 
lus, which will make it possible to derive formally all possible logi- 
cal corollaries from any given system of statements and will make it 
possible to characterize formally all the so-called identically true 
propositions (formulas ) of the calculus in question. 

In order to understand what identically true-propositions are, let 
us consider some examples, Propositions of the type "oxygen is a gas" 
or "two times two is eleven" are examples of the so-called elementary 
constant propositions. The elementary nature of these propositions 
consists in the fact that they cannot be divided into simpler con- 
ponent parts which themselves would be propositions. Actually, the ex- 
pressions "is a gas" or "two times two" are not complete propositions 
since the question of truth or falsity has not meaning relative to 
them. The term "constant" in application to the propositions presented 
is to emphasize that we are considering completely defined proposi- 
tions relating to completely defined areas of knowledge. 

We note that the truth or falsity of these propositions depends 
on the conditions in which they are considered and is established, as 
a rule, outside the limits of mathematical logic. In application to 
the first proposition this concept is obvious (oxygen under certain 
conditions can be not only a gas but also a liquid or even a solid). 
The second proposition, however, at first glance seems obviously false. 
Actually, though, all we have to do is to assume that in place of the 
decimal system of numbers, we are using the ternary system under the 
condition of retaining the names of multiplece numbers with which we 
are familiar, and the proposition "two times two is eleven" (2 x 2 = 
= 301 + 1 = 11) changes from false to true. 

Therefore, in the applications of mathematical logic it will be 
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necessary to specify the conditions under which a particular constant 
proposition is made so accurately and definitely that the truth value 
of this proposition cannot undergo changes in the process of obtaining 
various sorts of conclusions and corollaries from the proposition with- 
in the framework of the logical calculus being used. Thus, any constant 
proposition must be considered to be true all the time or false all the 
time through the entire duration of a particular logical derivation. 

In propositional calculus we are not interested in the internal 
structure of the elementary propositions, considering them as whole 
units. Therefore, for their designation it is natural to make use of 
the individual letters of some alphabet (usually Latin). Individual 
letters can also be used to designate the so-called variable proposi- 
tions. The term "variable proposition" in application to a particular 
symbol means that in place of this symbol there can always be substi- 
tuted any specific constant proposition, either true or false. 

Propositions, both constant and variable, can be combined into 


" 't 


complex propositions by using as the connective the words "and", "or", 


"af — then", "not", etc. If variable propositions occur in the composi- 
tion of the complex propositions, then with replacement of them by 
certain propositions the complex proposition may be true, and with re- 
placement by others it may be false. For example, the complex proposi- 
tion "A and B" where A and B are variable propositions will obviously 
be true in the case and only in the case when both propositions A and 
B are true. 

However, there do exist complex propostions containing in their 
composition variable propositions which remain true for any values 
which can be given to the variable propositions mentioned. For ex- 
ample, the complex proposition "if it is incorrect that the propcsi- 
tion A is false, then the proposition A is true" remains true no mat- 
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ter what proposition is substituted in place of the variable proposi- 
tion A. Such propositions are customarily called identically true pro- 
positions. The problem of the separation of the identically true pro- 
positions in the set of all possible propositions is a most important 
task of any logical calculus. 

After all our preliminary remarks we turn to the construction of 
the propositional calculus itself. 

Propositional calculus is constructed from formal objects of three 
types. The objects of the first type are the variable and constant 
propositions which are not separable into individual component parts. 
For their designation we shall make use of the capital Latin letters 
(with or without subscripts), calling them propositional letters. The 
objects of the second type are the propositional connectives - the for- 
mal equivalents of the connective words presented above "not", "and", 
"af — then". For their designation we shall make use of the correspond- 
ing symbols of negation (|), disjunction (v), conjuction (A) and im- 
plication (>). We note that in the reading of the formulas it is more 
convenient to replace the implication symbol by the word "implies" and 
not by the words "if - then". The objects of the third type are the 
parentheses, which serve for expressing the order in which the pro- 
positional connective which we have listed are to operate. 

Similarly to the way in which the formulas of boolean algebra 
were constructed in the beginning of §2, the formulas of propositional 
calculus are constructed from the formal objects which we have intro- 
duced. The difference lies in the use of the additional symbol ) (for- 
mal analogy of the boolean implication operation), and als» in the re- 
placement of the dot in the designation of the conjunction ‘/multiplica- 
tion) by the symbol”, and in the use of the symbol 7] standing before 
the negated expression in place of the bar over the negated symbol as 
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the sign of negation. 

The formulas of propositional calculus are the individusl letters 
and also all the expressions constructed recurrently from the already 
constructed formulas A and B using the rules7(%), (W) A (B)(0 V &, (> (9). 
Just as in the case of the formulas of boolean algebra, in order to 
simplify the writing, a part of the parentheses can be dropped if this 
does not cause any ambiguity in the order of application of the pro- 
positional connectives. It is assumed that in the absence of parenthe- 
ses is determined by the sequency |], A,Vv, _4 and for like paren- 
theses the order is that of their appearance in the formula, read from 
left to right. In several cases an additional symbol is introduced in 
propositional calculus = (or ~; read as abbreviated notation in the 
form (A) = (B) of the expression —((%) 3) A(@)> (). Im 
using this symbol it is assumed that it occupies the last place in the 
sequence of symbols we have just written out (after the symbol _). 

As an illustration of the method used for the reduction of the 
number of parentheses, we note that the formula ~jAVBAC2DBVC 
is understood as ((7A)V(BAC))3(B VC, and not in any other way, ‘ne 
formula A = B _-)C must be understood as (A) = ((B) ) (C)) and not as 
((A) = (B)) > (C), etc. 

The definitions introduced above resolve only the first part of 
the problem of the construction of the propositional calculus - the 
problem of the formalization of the writing of the complex proposi- 
tions. The second part of this problem - finding the method for the 
determination of the identically true propositions — can be resolved 
in two ways: the contensive and formal approaches. 

In the contensive approach, which is easier to understand, we 
cannot for a moment forget about the contensive meaning of the con- 
cepts of the propositional letters and the propositional connectives. 
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In this case use is made to the maximal possible degree of the basic 
concept of the contensive meaning. Thus, if the propositional letter 
A designates a particular constant propoistion, then there is no need 
to remember this proposition itself, it is necessary only to know the 
value of the so-called truth function of this proposition: "true" if 
the proposition A is true, and "false" if the proposition A is false. 

The truth function of any propositional letter denoting a vari- 
able proposition is identified with this letter itself, considering 
4t as a boolean variable. Thus, the contensive meaning of the proposi- 
tional letters in our construction is exhausted by their capability 
of taking two values: "true" and "false." 

We shall associate the contensive value of the propositional 
connectives only with the truth functions of the complex propositions 
constructed with their use. Every formula A of propositional calculus 
can be interpreted as a formula in boolean algebra with inclusion in 
it of the additional operation of implication. The constant proposi- 
tions appearing in the formula A must be replaced by the correspond- 
ing boolean constants (values of their truth functions). The symbols 
ecrresponding to the variable propositions are considered as arguments 
of the boolean function represented by the formula A. This function 
is then termed the truth function of the complex propositions ex- 
pressed by the formula A, 

On the contensive level of the construction of propositional cal- 
culus, those and only those formulas of this calculus (complex pro- 
positions) whose truth functions take the value "true" for all values 
of the variables are considered to be identically true. 

We recall that as a result of the agreement made in §1, the 
value "true" corresponds to one and the value "false" corresponds to 
zero. Using the value tables presented in §1 for the conjuction xvy 
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(table expressed by the cortege (0001)), for the disjunction xyvy 
(cortege (0111)), for implication xy (cortege (1101)), and recalling 
that negation transforms 1 into 0, and O into 1, it is easy to find 
the value table of the truth function for any formula of propositional 
calculus. This table is usually termed the truth table of the formula 
in question (or of the complex proposition defined by it). In fill- 
ing in the table we use the abbreviated designations: T for true and 
F for false. As an example we present the truth table for the formula 
A )Bwhich defines the contensive meaning of implication (considered 


as a propositional connective): 


; From this table we see that the meaning of 


A | B | ADB the term "implies" (corresponding to the proposi- 








Jd |. H tional connective >) in propositional calculus is 








somewhat different than in ordinary speech. Actu- 
ally, usually when we say that some proposition 


A implies another proposition B we have in mind 


MU = true; 
JI = false. 


that the propositions A and B are casually re- 
lated with one another. Thus, the complex proposition which states 
that the proposition "this substance is oxygen" implies the proposi- 
tion "this substance is a gas" seems to us (with the reservation made 
above on the gaseous nature of oxygen) both true and reasonable. At 
the same time, the complex propostion which states that the proposi- 
tion "it is cold in the winter" implies the proposition "two times 
two is four" seems to us complete nonsense. However, on the strength 
of the truth table constructed above for the formula A _)B, in pro- 
positional calculus the second of these complex propositions must be 
considered true to no less a degree than the first. 

The reason for this presumption is nee difficult to understand. 
Actually, in limiting ourselves by the condition of considering the 
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the elementary propositions only form the position of whether they are 
true of false, we have thereby made all the true (and all the false) 
elementary propositions quite indistinguishable from one another. There- 
fore, in particular, in the definition of the content embedded in the 
connective _)we are forced to operate only with the concepts of truth 
and falsity, and in this direction it is obviously not possible to 
penetrate into the inner structure of the elementary propositions of 
all classifications, which, or course, is necessary for the establish- 
ment of causal connections between them. 

Disclosure of the internal structure of the elementary propositions 
and the associated increase of the capabilities for logical analysis 
are achieved by means of more complex logical calculus, in particular 
the so-called predicate calculus (see Chapter 6). As for propositional 
calculus, we must content ourselves with the relative poverty of its 
expressive capabilities, accepting this as a sort of payment for the 
simplicity and clarity of this calculus. 

The propositional connective ) is used later on as an instrument 
for obtaining logical corollaries from particular formulas of proposi-~ 
tional calculus and in the other, higher logical calculuses. Such 
corollaries must be true for truth of the original formulas. Therefore, 
the construction of the derivation must of necessity exclude the pos- 
sibility (by indicating its falsity in this case) of obtaining false 
corollaries with truth of the original formulas. At the same time, with 
falsity of the original information the obtaining of any corollaries 
(both true and false as well) does not indicate, of course, falsity of 
the construction itself of the derivation. This circumstance finds its 
concrete expression in the truth table for the formula A _)B. All we 
have said here will become more understandable after acquaintance with 
the formal aspect of propositional calculus. 
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The contensive aspect of propositional calculus which we have 
described makes it relatively easy to resolve the question on the 
identical truth of any complex proposition (given by more formula of 
the calculus): it is sufficient to sort over all possible sets of the 
truth values of the variable propositions composing it and verify 
wuether on all these sets the truth function of the complex proposi- 


" For example, the formula 


tion in question takes the value "true. 
AABOA_ will be an identically true formula of the propositional 
calculus on the basis of the following verification: if A= and 
B=S, then the formula AABDA reduces to DM , Which, in view 
of the truth table, gives the value MW; the same will be the case with 
A= and B=H ; with A= , depending on the values of BIN or, 
we reduce our formula either to JIDH, , or to HoH , which, on the 
basis of the truth tables, in both cases leads to the vaiue y. 

We can use the technique of transformations in boolean algebra 
for the proof of the identical truth of the formulas of propositional 
calculus. We need only first replace all the implications according to 
the formula({> 8) =(7% vy B)(see §1). With application to the example we 
have just considered, we obtain the following chain of transformation. 
AKA BDA = 1(AAB)V A =(GAV 78) VA = TAVAV TB HIV B=! This chain 
proves the identical truth of the formula we started with. 

The identically false propositions can be considered similarly, 
i.e., those (complex) propositions whose truth functions take the val- 
ue "false" for all values of the variable propositions composing the 
given proposition. It is easy to understand that the class of all 
identically false propositions coincides with the negations of all 
possible identically true propositions. 

In spite of the simplicity and the clarity, the contensive aspect 


of propositional calculus also has several drawbacks. First, the meth- 
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od of proof of the truth of the formulas, based on the sorting of all 
sets of values of the arguments, does not permit direct transfer of 
this method to the more complex calculuses in which the number of such 
sets may be infinite. Second, the methods which we have derived above 
permit the direct determination not of the identically true proposi- 
tions, but of the propositions which are true with particular addi- 
tional assumptions (for example, the formula A -)B, which is not iden- 
tically true, becomes true under the condition that the formula 
Af”A_i8). is false). But problems of this sort constantly arise in 
the various applications of logical calculus. We can, it is true, de- 
velop the corresponding methods within the framework 6f boolean alge- 
bra, however in this case still another essential difficulty is ag- 
gravated which is associated with the contensive aspect of proposi- 
tional calculus — the insufficient formalization of the proof process 
and the very concept of the proof of the truth of particular formulas. 

These deficiencies are eliminated in the completely formal ap- 
proach to the construction of propositional calculus, which formalizes 
not only the method of writing the formulas (the method already de- 
scribed is adequate for this), but also the concept of the identically 
true formulas and the process of the derivation of the logical corol-~ 
laries from particular propositions. 

The formal aspect of propositional calculus is characterized by 
the fact that in this case we completely avoid the contensive meaning 
of the formulas, regarding them simply as finite sequences of individ- 
ually distinguishable symbols. 

For the characterization of the set of all identically true for- 
mulas we construct the axiom system of the calculus in question. Such 
systems can be chosen in various ways. We shall consider one of the 
most widely used axion systems of predicate calculus (see Kleene [42]). 
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This system includes the following axioms: 
1. AD(B DA). 


. (ADB) D((A D(B DC)) D(A DC)). 

. AD(BDA AB). 

. ANBDA. 

AA B2DB. 

.ADAVB. 

BDAV B. 

. (ADC) D((BDC) D(A V BDC). 

. (ADB) D(AD B)D TA). 

STITIADA. 1, SE O8 

The first ten axioms are simply ten formulas of propositional 


calculus which are identically true by definition. The identical truth 


of the axioms presents the possibility of the substitution in place 


oOo nN DN B w wo 


ome 
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of the propositional letters A, B, C appearing in them of any forulas 
of propositional calculus (not necessarily true). Such a substitution, 
by definition, will not destroy the identical truth of the formula 
(axiom) subjected to this substitution. 

The eleventh axiom has its own specific nature. This is the so- 
called rule of derivation which makes it possible, by definition, to 
consider the truth of formula B proved if the truth of formulas A and 
A JB has already been proved previously. If the formulas A and A JB 
are in this case identically true then formula B will also be iden- 
tically true. It is presumed by definition. that all identically true 
formulas (and only such formulas) of propositional calculus can be 
obtained from the axioms as the result of the described substitutions 
and applications (multiple, generally speaking) of the derivation 
rule ll. 

It in no wise follows a priority that the set formally charac- 
terized in this fashion of all identically true formulas of proposi- 


tional calculus will coincide with the set of all identically true 
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formulas defined contensively above. Since the formal identical 
truth of the formulas is established by the procedure of the deriva-— 
tion or proof, they are also termed (formally) provable formulas or 
(formal) theorems. 

The formulas which are identically true in the contensive sense 
we shall for brevity term simply contensively true, contrasting them 
with the formally true (i.e., formally provable) formulas. 

The concept of formal derivation (proof) can be extended to the 
case when, in addition to the axioms, there is given also some quan- 
tity of formulas Ay; Ay ee A, of the propositional calculus as con- 
ditionally true formulas. These formuias are not derivable from the 
axioms (not formally provable) and therefore are not formally true 
formulas. The presumption on their truth is of a conditional nature 
and is retained only in the course of the derivation in question. In 
contrast with the axioms of propositional calculus, in these formulas 
we cannot, generally speaking, replace the propositional letters ap- 
pearing in them by arbitrary formulas. In other words, conditional 
truth, in contrast with formal truth, does not have an identical na-~ 
ture. 

However, the rules of the derivation themselves are in essence 
retained as before. The primary role, as before, is played 5 the con- 
cept of the direct corollary (axiom 11): formula B is termed the di- 
rect corollary of the formulas A and A  B. We say that the formula 
C of propositional calculus is derived from formulas Ay; Ags Sr ouy An» 
if it can be obtained from these formulas and axioms 1-10 of proposi- 
tional calculus as the result of the application (a finite number of 
times) of the rule of the direct corollary. More precisely, in the 
case being considered those formulas and only those formulas will be 
derivable which are obtained as the result of the sequential applica- 
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tion of three rules. 1. Any of the formulas A, Ay; ee a5 An 1s deriv- 
able. 2. Any of the axioms 1-10 (with account for the possibility of 
the substitution of any formulas in place of the letters appearing in 
them) is derivable. 3. If the formulas A and A DB are derivable, 
then formula B will also be derivable. The chain of formulas obvained 
as the result of the sequential application of these three rules, 
which terminates with some formula Cc, is termed the formal derivation 
of this formula, 

For the designation of the cerviability we make use of the special 
symbol - (read as "gives"), to the left of which there are written 
the conditionally true formulas and to the right are written their 
corollaries: Ay» Ay, Sees § A, |— C. The axioms of propositional calcu- 
lus are not written out explicitly here (the possibility of thcir use 
in the derivation is really included in the symbol }-) so that for 
any formally true formula B we can write L B. In other words, the 
formally true formulas are considered derivable from the empty set 
of (conditionally true) fromulas. Therefore axioms 1-10 can also be 
considered as sort of rules of derivation which derive the formulas 
representing them from the empty set of formulas. 

We shall present very simple examples of the formai derivation, 
numbering the sequential steps. 

1. A >(A DA) (axiom 1, in which the letter B is replaced by 
the letter A). 

2.(AD (A D A)) D(A D(A D A) DA) D(A DA)). (axiom 2, which the letter 
B is replaced by the formula A _)A, and the letter C is replaced by 
the letter A). 

3. (AD ((A DA) DA))D (A DA) (application of the derivation 
rule 11 to the formulas obtained in steps 1 and 2). 

4. A D)((A DA) DA (axiom 1, in which the letter B is replaced 

=1300= 
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by the formula (A > A)). 

5. ADA (application of the derivation rule 11 to the formulas 
obtained in the preceding two steps). 

This chain of formulas is, on the basis of the definition, the 
formal proof of the formula A A, i.e., its derivation from the empty 
set of (conditionally true) formulas. Thus, the formula A A belongs 
to the number of the formally true formulas ans it can be written as 
L ADA. 

Another example is the derivation of the corollaries from the 
three conditionally true formulas A, B, A )(B)C). The formula C 
can be derived from these formulas after 5 steps. 

1. A (first given (conditionally true) formula). 

2. B (second given formula). 

3. A)(B DC) (third given formula). 

4, BC (direct corollary (from derivation rule 11) from formu- 
las 1 and 2). | 

5. C (direct corollary of formulas 2 and 4). Thus, formula C is 
derivable from formulas A, B, A _XB )C), and we can write A, B, A) 
(BDC) +c. 

Although the conditionally true formulas do not possess identi- 
cal truth, still, as it is easy to see, in the final writing of the 
(conditional) derivability with the use of the symbol + any letter 
can be replaced by an arbitrary formula of propositional calculus, if 
such a replacement is performed simultaneously both to the left and 
to the right of the derivability symbol. Replacement in only one side 
will lead, generally speaxing, to error. 


In similar fashion we can prove the relations 


chain conclusion ADB,BDIC}ADC; (35) 
permutation of premises 4 3(B 3C)|-B D(A DC); (36) 
= Uo 


importation AD(B>C)- A A BDC; (37) 
exportation AA BDC}LAD(B DC); (38) 
contraposition ADBK BD “A; (39) 
A -insertion A,B-A {A B; (40) 
weak | =-removal 4,7iAI-B. (41) 


If we designate by [ an arbitrary finite ensemble of formulas of 
propositional calculus, then, using somewhat more complex method of 
proof (induction during the derivation) we can obtain the following 
result (the so-called deduction theorm). 

Theorem 1. If in the propositional calculus formula B is derivable 
from the combination of formulas T and A then the formula A _)B is 
derivable from I. 

Two universal proof schemes are also of importance in the theory 
of proofs. 

1. Proof by means of analysis of cases: if T, A }+C andl, B} C, 
then 

r, AV BEC. (42) 
2. Reduction and absurdum: if fr, A +} Band Tl, A - “| B, then 
rE AA. (43) 

It is easy to verify that all the axioms 1-10 of the propositional 
calculus are contensively true formulas. In other words, the truth 
functions corresponding to them take the value "true" for all values 
of the variables. This property is obviously retained with substitu- 
tions of any formulas of the propositional calculus in place of the 
letters appearing in the axioms. From the truth table for implication 
it follows directly that from the contensive truth of the formulas A 
and A JB there follows the contensive truth of the formula B. But 
then, obviously, all the provable (formally true) formulas will inevit- 
ably be contensively true. The reverse is also true (although much 
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more complex to prove), so that the following important result can be 
formulated. 

Theorem 2. In the formal construction of propositional calculus 
using the system of axioms l-1l1l, all those and only those formulas 
of this calculus will be provable (formally true) which are identi- 
‘cally true in the contensive sense. 

Theorem 2 contains, actually, two results relative to the se-~ 
lected system S of axioms of the propositional calculus. The first 
result is that the system S is contensively consistent or, other 
words, with the ald of the system S we cannot prove a single formu’a 
which is not a contensively true fornula. 

The second result states the contensive completeness of the sys- 
tem of axioms S: there is no single contensively true formula of pro- 
positions which cannot be proved formally with the aid of this system 
of axioms. 

The question arises of whether it is possible to determine the 
properties of consistency and completeness purely formally without 
resorting to the contensive constructions. It is found that it is 
possible. 

It is natural to term the system of axioms of propositional cal- 
culus formally consistent if with its aid we cannot derive any for- 
mula A together with its negation | A, and formally inconsistent in 
the opposite case. 

From the property of weak |-removal it follows directly that in 
the case of the formal inconsistency of the system of axioms any for- 
mula of propositional calculus would be formally provable. Since, as 
the result of Theorem 2, for the system S the letter situation does 
not occur, then this system is not only contensively consistent but 
also 1s formally consistent, or, as is often said, is simply a con- 
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sistent system of azioms. 


The property of formal completeness for the system of axioms can 
be defined as follows: a system of axiom is termed formally complete 
(or complete in the restricted sense) if the addition to this system 
as a new axiom of any formula which is not provable in the system 
leads to the system of axioms thus expanded being formally inconsis- 
tent. In this case it is usually presumed that the original axiom 
system was formally consistent. 

It can be shown that the system of axioms l-1l1 of propositional 
calculus which we have introduced is not only a contensively but also 
a formally complete system of axioms. Under the condition of satisfac- 
tion of the property of contensive consistency and with the use of 
only rule 11 as a derivation rule, from the property of formal com- 
pleteness, since otherwise any nonprovable contensively true formula 
could be used for consistent expansion of the original axiom system. 

In the axiom system we have chosen there is not a single redun— 
dant axiom. More precisely, no one of the formulas 1-10 can be formally 
proved with the aid of the ensemble of all the remaining axioms. This 
property is termed the property of independence of the axioms of the 
selected system. The property of independence is proved separately for 
each axiom with the aid of the construction of a contensive interpre- 
tation for which this axiom is not utilized while all the reinaining 
axioms are utilized. 

We note, finally, that although the joining of unprovable formu- 
las as new axioms to the propositional calculus axiom system S which 
we have chosen, on the strength of the property of formal completeness 
of this system, destroys the property of its formal consistency, noth- 
ing prevents us from joining to the system S the unprovable (in 8) 


formulas Aj, ..., A, as conditionally true formulas rather than as 


9 
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identically true formulas. It can be shown that inconsistency (the 
possibility of deriving some formula together with its negation) in 
this case arises when and only when the cmjunction 4, AmM,A...A%, 18 
an identically false formula. 

As the result of this joining, there arises. a formal theory wus 
goes beyond the framework of mathematical logic proper, since the 
joined formulas A» Ay, Pare An are not true in the strictly logical 
sense. If in our constructions there is some particular contensive 
meaning, then the contensive truth of the formulas Ay; Ay, ee An 
must be postulated or have some clearly extra-logical basis. In that 
case it is natural to consider these formulas as axioms of the formal 
theory constructed on their basis. In order not to confuse them with 
the logic axioms 1-11 themselves, the latter are in this case termed 
not axioms, but axiom schemes, thereby emphasizing that each of the 
axiomm 1-11 3.3 actually a whole set of axioms obtained from the for- 
mula corresponding to this axiom as the result of the replacement by 


arbitrary formulas of the propositional calculus of the letters ap- 


pearing in this formula. 
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Chapter 3 
THEORY OF AUTOMATA 
§1. ABSTRACT AUTOMATA AND AUTOMATON REPRESENTATIONS 

Let us consider the alphabetic transformations realizable by 
discrete information processors which put out some output signal 
(letter of the output alphabet) in response to each input signal 
(letter of the input alphabet). Such processors, considered without 
regard to their internal structure, are customarily termed abstract 
automata. 

For the specification of an abstract automaton, three sets must 
be given: the input alphabet X , the output alphabet Y and the set of 
internal states of the automaton, which we shall denote by the letter 
A. The automaton operates in discrete time, whose sequential moments 
are conveniently identified with the sequential natural numbers 
t = 0, 1, 2, ... (which we can always do by suitable choice of the 
time measurement unit). 

At every given instant of discrete automaton time t = 0, l, 
the automaton A is in some definite state a = a(t) of the set A of 
its internal states, which for brevity we shall term the state set of 
the automaton A, The state ay = a(O) at the initial instant of time 
t = O is termed the initial state of the automaton A. If the initial 
state remains unchanged during any experiments with the automaton, 
then this automaton is termed an initial automaton. Since, howevcr, in 
practice we do not consider any automata other than initial, the term 


"4nitial" is frequently dropped. 
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At every instant t of automaton time, beginning with t = 1, to 
the input of the automaton there 1s applied as the input signal one 
of the letters of the input alphabet X x = x(t). The finite ordered 
sequences of the input signals x(1)x(2) ... x(k) of the automaton are 
termed the input words of this automaton. Any input word from some 
a priori fixed set of admissible input words can be applied to the in- 
put of the automaton. 

Any admissible word p = x(1) x (2) ... x(k), applied to the in- 
put of a given initial automaton A causes the appearance at the out- 
put of the automaton of che output word q = y(1l)y(2) ... y(k), which 
is some ordered finite sequence of the output signals of the automaton 
A (letters of its output alphabet Y) having the same length as its 
corresponding input word p and which is uniquely determined by the 
input word p. The resulting correspondence @ between the admissible 
input words p and their corresponding output g is termed the (alpha- 
betic) representation induced by the initial automaton A in question. 

This representation @ is uniquely determined by specifying the 
two functions 6 and A, termed respectively the switching function and 
the output function of the automaton A in question. 

The switching function determines the state a(t) of the automaton 
at any instant of discrete automaton time t from the input signal x(t) 
at that same instant and from the state a(t — 1) at the preceding in- 
stant of automaton time 

a(t) =b(a(t—1), x(t). (44) 

The output function determines the variation of the output sig- 

nal y(t) of the automaton with these same variables 
y() =A(et—1), x(f). (45) 

Specifying any input word p = x(1) x (2) ... x(k) and initial 

state a(0) of the automaton, with the aid of relations (44) and (45) 
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we can sequentially determine all the letters of the corresponding 
output word 
9 = 9(p) = y(I)y (2)... 9 (R). 

Thus, the relations (44) and (45) actually define the representa- 
tion m induced by the automaton. 

The switching and output functions are usually the abstract par— 
tial functions 6(a, x) and A(a, x) which specify the single-valued 
representations of some set of pairs (a, x) (a ¢€ A, x € X) in the sets 
A and Y respectively. Admissible input words are those and only those 
input words p on which with the aid of the function 6 and A using the 
method described above there are determined their corresponding output 
words @(p). 

The automaton is termed finite if all three of the sets A, X, Y 
defining it are finite. Since we limit ourselves almost exclusively 
to the consideration of finite automata, the word "finite" is often 
dropped. The automaton is called completely determinate if its switch- 


ing and output functions are given on all pairs (a, x), and partially 


determinate otherwise. 


The finite automata are customarily specified by two tables, 
termed respectively the switching table and the output table of the 
automaton. The rows of both tables are designated by the different 
letters of the input alphabet X of the automaton, and the columns by 
the different states of the automaton. At the intersection of the 
x-th row and the a-th column of the switching table there stands the 
element 6(a, x), 1.e., some state of the automaton from the set of 
its internal states, and at the intersection of the x-th row and the 
a-th column of the output table there stands the element A(a, x), l.e., 
some letter of the output alphabet Y of the automaton. Thus the spe- 
cification of the switching and output tables determines both the sets 
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X, Y, A, and the switching and output functions of the automaton. For 
fixing the initial state it is usually customary to designate the first 
column on the left of both these tables with this state. Thus, the 
use of the two tables makes it possible to specify any finite auto- 
mata, including the initial automata. 

Another method of specifying the finite automata which provides 
better visualization is that of the directed graphs. The vertices of 
the graph (shown as circles on the figures) are identified with the 
various states of the automaton. The arrow connecting the vertex i 
with the vertex j signifies that there exists an input signal x which 
transfers the automaton from the state i into the state j, l.e. sat- 
isfying the relation 

i= 84, x). 

In order to differentiate precisely which input signals cause 
the transfer of the automaton from state 1 into the state Jj, the arrow 
connecting the graph vertices corresponding to these states are flagged 
with the symblos of these inprt signals. The output signal y deter- 
mined by the pair (1, x) 1s usually placed on the graph alongside the 
input signal x and to differentiate it from the input signals it is 
inclosed in parentheses. 

Let us consider an example of the specification of a finite 
automaton using the switching and output tables of the directed graph. 
Let us choose for this purpose the relatively simple automaton with 
three internal states 1, 2, 3, two input signals x, y and two output 
signals u, v. We assume that this automaton is specified by the 


switching and output tables 
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The directed graph shown in Fig. 8 corresponds to these tables. 
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The automata we have considered above are customarily termed 
Mealy automata (from the name of the scientist who first considered 
several questions associated with the functioning of such automata; 
see [55]). In practice we frequently have to deal also with somewhat 
differently defined automata which are termed Moore automata (see 
E571). 

The Moore automata differ from the Mealy 
automata only in the method of defining their 
output functions. In place of the relation 


y(t) = A(a(t— 1). x (0). 
which defines the output signal for the Mealy 





automata, in the case of the Moore automata 


x 
Fig. 8 we use a somewhat different relation 


y() = (a (0). (46) 
Which the aid of relation (46) and the previously written rela- 


tion (44), just as in the preceding case, there is determined the re- 
presentation induced by any given Moore automaton. 

For reasons which will be considered later, we call the function 
y = u(a) the shifted output function of the Moore automaton. The val- 
ue of this function for any state a is customarily termed the label of 
this state. The finite Moore automata are conveniently specified with 
the use of the so-called labelled switching tables. The labelled 
switcning table is nothing other than the conventional switching table 
of an automaton in which above the symbols of the states desirnating 


the various columns of the table there are placed the labels of these 
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states. For example, the labelled switching table 


“uv 
123 

alaee 
322 





specifies the Moore automaton having the same switching table as ti 
Mealy automaton in which the output signal u corresponds to states 

1 and 2, and the output signal v corresponds to the state 3. In the 
representation of the Moore automata with the use of graphs, the 
symbols of the output signals label the corresponding vertices of the 
graph, and not the lines as in the case of the Mealy automata. 

We agree to consider that the delivery of the output signals in 
the Moore automaton begins at the instant of time t = 1 (at not at 
the instant of time t = 0). With this condition, for any Moore autom- 
aton A, it is not difficult to construct that Mealy automaton Ay hav- 
ing the same switching table and inducing the same representation as 
the automaton Ay: 

Actually, if 6(a, x) is the switching function and u(2) is the 
shifted output function of the Moore automaton Als then we can define 
the Mealy automaton Ay by specifying its switching function 6(a, x) 
and output function A(a, x) = u(6 (a, x)). Then 

y(t) =A(a(t—1), x) =HG(@t—1). x) =H(a(). 
which proves that the automata Ay and Ay react completely identically 
to any sequence of input signals. The construction described is termed 
the interpretation of the given Moore automaton as a Mealy automaton. 
Tne physical meaning of such an interpretation (in real automata) con- 
sists in the shift of the automaton time by one elementary interval 
of time, on the strength of which in the constructed Mealy automaton 


A, the output signals lead by one unit of automaton time their cor- 


ca 


responding output signals in the Moore automaton A It is precisely 


1° 
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for this reason that the output functions of the Moore automaton are 
termed the shifted output functions. 

The described time shift maked it possible to consider the Moore 
automata as a particular case of the Mealy automata evcry time that 
we are interested not in the real time of the appearance of a partic- 
ular output signal, but only in the sequence of succession of the out- 
put signals in time. It is exactly this situation which we encounter 
in the abstract theory of automata, being interested only in the re- 
presentations induced by the automata and the switchings in their 
memory, and not in the method of composition of a given automaton from 
the elementary automata available to us. 

In the resolution of the latter question, constituting the sub- 
ject of the so-called structural theory of automata, the Mealy autom- 
ata are to be considered as a separate class of automata which is not 
an intrinsic subclass of the class of all Mealy automata. The diffcr- 
ence between these two classes of automata in structural theory is 
due to the fact that in the Mealy automata the output signal arises 
simultaneously with the input signal which induces it, while in the 
Moore automata there is a delay of one unit of automaton time. 

The possibility of the interpretation of every Moore automaton 
as a Mealy automaton in the abstract theory of automata does not in- 
dicate, or course, the existence of the reverse possibility. Neve1r- 
theless, for any Mealy automaton A we can construct a Moore automaton 
B which will induce the same representation as the automaton A. Herc, 
in contrast with the preceding case, the set of states of automaton 
B will not, generally speaking, coincide with the set of states of 
automaton A, althought it will be finite whenever the latter set is 
finite. 

Actually, let us assume that there is given the arbitrary Mealy 

iis a te 


automaton A with the set of states A, the input alphabet X, the oute- 
put alphabet Y, the switching function 6(a, x), the output function 
r(a, x) and the initial state Bo: We agree for simplicity of notation 
that here and hereafter we shall use in place of the switching func-— 
tion the multiplication symbol, designating the value of the function 
6(a, x) by the product ax. 

Let us construct the Moore automaton B, selecting as the set B, 
of its states the set consisting of the initial state Ao and the set 
of all possible pairs (8, x) where a ¢ Ax € X. The input and output 
alphabets of the automaton B coincide respectively with the input and 
output alphabets of the automaton A. We determine the switching func- 
tion of the automaton B, setting 

Ggx == (Ax) aNd (a, x,) x, = (ax, x): 

We determine the shifted output function u(b) of the automaton 
B on each state b = (a, x) which differs from the initial state Ao 
with the aid of the relation u(b) = A(a, x). In the initial state the 
value of the function uw can be selected arbitrarily. As a result there 
is constructed some Moore automaton B. 

It is not difficult to see that theautomaton B induces the same 
representation as the automaton A. Actually, let us designate by the 
letter m the representation induced by the automaton A and by the 
letter yw the representation induced by the automaton B. Assume that 
for any input word p = Pix, of length n > 1 it has already been proved 
that o(p) = ¥(p) = q (for the input word of length 1, 1.e., for any 
single-letter word x, obviously, (x) = ¥(x) = (ao; xy): 

Let us consider the reaction of both automata to the arbitrary 
word PX 4 or length n+ 1. Let us agree here and hereafter to designate 
with the word af the state into which there transfers an automaton 
which was initially in the state a if to its input there is applied 
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sequentially, letter after letter, the arbitrary word g. As a result 
of the definition of the switching function in automaton RB, AoPp = 

= (Q0P,; x). After application to the automata A and B of the Input 
signal x, the automaton A delivers the output signal y = (aps xj) 
Automaton B will obviously transfer into the state 


5 = (aoP1X;, X;) = (Gop, *)) 
and will deliver the output signal u(b), equal, as a result of the 
definition of the function yp, to the signal A(aop, x ,). 

Thereby it is shown that the automata A and B Bese identically 
to any input word to length n+ 1. Performing an induction with re- 
spect to n, we come to the conclusion that the representations induced 
by the automata A and B are identical. This conclusion is valid not 
only for the conventional (completely determinate) automata, but also 
for the partial automata. 

Let us characterize in more detail the representations induced 
by the automata. We note that the requirement for the arrival of an 
input signal and the departure of an output signal at every instant 
of automaton time, which at first glance is not satisfied in any spe- 
cific automata, in actuality is easily satisfied if we introduce 
Special letters for the designation of empty input and output signals 
(i1.e., the absence of any real physical signals) and consider these 
letters on a par with the other letters of the input and output alpha- 
bets. 

It is easy to see that the representation @» induced by the arbi- 
trary Moore or Mealy automaton satisfies two conditions: 

1) to any word g in the input alphabet Xx the representation 
associates a word @(g) in the output alphabet Y which has a length 
identical to that of the word f; 

2) if the word Ly coincides with the initial segment of the word 
£, then the word @(4,) is the initial segment of the word 9(2). 
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Let us term the conditions just formulated the automaticity con- 
ditions of the representation » and every correspondence between the 
words in the alphabets X and Y which satisfy these conditions an autom- 
aton representation or automaton operator. 

It is not difficult to show that every automaton representation 
can be induced with the aid of some abstract automaton (not necessar— 
ily finite). 

Let the automaton correspondence @ map the set of words in the 


alphabet X = (x,, Kang wns x) into a set of words in the alphabet 


9? 
Ys (yy, Vos sees Vr) Let us construct the automaton A whose internal 
states will be all possible words in alphabet X and the initial state 
will be the empty word e (word of zero length, consisting of an empty 
set of letters). The switching function 6 is determined trivially: if 
g is any state of the automaton (word in the alphabet x), and x, is 
any input signal, then the value of the functicn 6(4, x, ) is assumed 
equal to the word I, After determining the ovtput function A by the 
relation A( 4, x, ) = Vy where Ys is the last letter of the word o( s&,), 
we obtain an automaton which realizes the original mapping 9. 

If the mapping » of the set of words in the alphabet X into the 
set of words in the alphabet Y is given by a partial automaton, then 
it will be, of course, only a partial mapping, not determinate on all 
the words. However, as before, both conditions of automaticity will 
be satisfied for this mapping under the additional assumption that 
o(£) exists. In this case the second condition of automaticity takes 
a stronger form: if (4g) exists and #, 1s the initial segment of the 
word £, then 9(£,) exists and coincides with some initial segment of 
the word (4). 

We shall term the rephrased conditions the automaticity condi- 


tions of the partial mapping », and every partial mapping satisfying 
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these conditions will be termed a partial automaton mapping. 
It is easy to setablish the validity of the following proposi- 


tion. 

Theorem 1. Every partial automaton mapping can be induced with 
the aid of some partial automaton (not necessarily finite). 

This proposition is proved by exactly the same method as in the 
case of the complete mapping. The difference is that the states of 
the partial automaton are considered to be not all the words of the 
input alphabet, but only those on which the mapping » is determinate. 

At first glance the automaticity conditions severely narrow the 
class of mappings which can be specified with the aid of' the abstract 
automata. It is well known, in particular, that the requirement for 
equality of the lengths of the input and output words is not satis- 
fied for a large portion of the algorithms which must be satisfied 
by particular specific automata. This difficulty, seeming very seri- 
ous at first glance, in actuality is easily removed with the aid of 
recoding of the input and output information on the basis of a very 
simple technique. 

The standard technque for the conversion of any partial corre- 
spondence m between words in the alphabets X and Y into a partial 
automaton correspondence is based on the introduction into the alpha- 
bets x and x of the letter a which was not contained in them previ- 
lously. The letter a is termed an empty word. The appearance of the 
empty word at the automaton input corresponds to the case when in 
actuality nothing is applied to the automaton input. Similarly the 
appearance of the empty word as an output signal signifies the ab- 
sence of any signal at the automaton output. 

Let us consider the arbitrary word @ of length n in the alphabet 
X, to which the initially specified partial mapping 9 associated the 

op sisi 


' ment in another word, for example in the word s 


word q = (2) of length m in the alphabet Y. Let us designate by the 
letter g, the word in the alphabet X, =~ XU (a), which is obtained as 
a result of the suffixing to the word g on the right m exemplars of 
the letter a. Similarly, we use the word qy to designate the word in 
the alphabet Y, Y (J (a), obtained as a result of the prefixing to tne 
word q on the left n exemplars of the latter a. We term this technique 
the standard technique for equalizing word lengths. 

Let us determine a new partial mapping P47 between words in the 
alphabets X, and Vy setting 4= 1 (£,) and repeating this technique 
for any word g in the alphabet X on which the mapping 9 is determinate. 
We further define this correspondence on all the initial segments 
ft) of the word £,, assuming that 9, (ft) coincides with the initial 
segment of the word @, (£,) having a length equal to oft), 

With this redefinition there arises the danger of loss of unigue- 

ness of the mapping @, since the word aft) can occur not only as the 
initial segment in the original word fy» but also as the initial seg- 
1 obtained as the re- 
sult of the application of the standard technique or equalizing word 
lengths from some word 8 in the alphabet X. 

Since the word Sy has the form 8, = Saa ...a, and the word Ly 


has the form £, = faa... a, where the words s and ¢ do not contain 
the letter a, then p=s = gif the word gf?) has on the right at 
least one letter a : ft) = pa... . In this case, consequently, the 


words Sy and hy must coincide with one another and there is no danger 
of ambiguity arising. 

It remains, thus, to consider the case when the word ft) * Dp 
consists exclusively of letters of the alphabet X. In this case the 
length of the word p will clearly not exceed the lengths of the words 
Zand s. But then, as a result of the standard technique for the equal- 
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izing of word lengths, the initial segments of the words (2) and 

1 (S,); having a length equal to that of the word ft) = p, consist 
entirely of the letters a and, consequently, coincide with one another. 
Thus, the occurrence of ambiguity is excluded again in this case. 

The partial mapping ?1 between words in the alphabets Xy and Yy 
which we have constructed satisfies both conditions of automaticity 
for partial mappings on the basis of the method of construction itself 
and is, consequently, the sought partial automaton mapping. 

The described technique for the transformation of any partial 
mapping into an automaton mapping is universal, however, precisely be- 
cause of its universality it does not always lead to the most econon- 
ical (from the point of view of the use of additional letters) solu- 
tion. This circumstance is particularly easily clarified for the case 
when the original partial mapping » itself satisfied both conditions 
of automaticity. It is clear that the most economical solution in this 
case will be 9) = 9 However, the c ‘scribed standard method (which we 
use, of course, in this case as well) leads to an unnecessary increase 
of the lengths of the original words which participate in the corre- 
spondence, 

Thus, the universal technique found does not avoid the necessity 
for looking for more economical solutions. Such economic solutions are 
usually found by adding empty letters to the words gradually, step by 
step, rather than all at once in the quantity provided for by the 
standard technique for equalizing word lengths, checking at each step 
for satisfaction of the automaticity conditions and stopping as soon 
as they are satisfied for the first time. Such an improved technique 
for equalizing word lengths will lead sooner or later to the appearance 
of the automaton mapping. 

Of considerable interest is the problem of finding the economical 
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recoding of the mapping, given on a particularalgorithmic language 
(for example, on the language of the normal algorithms) for the pur- 
pose of converting it into sn automaton correspondence, and also the 
problem of the construction of the theory of algorithms which satisfy 
the conditions of automaticity and therefore are termed automaton al- 
gorithms for short. One of the possible approaches to the theroy of 
automaton algorithms id developed in the following section. 
§2. EVENTS AND REPRESENTATION OF EVENTS IN AUTOMATA 

Let A be an arbitrary (partial, generally speaking) initial autom- 
aton, » the mapping induced by it. For each letter Ya of the output 
alphabet Y = (yy; Vos sees Yn) of automaton A let us consider the set 


R, of all words g in the input alphabet X = (x4 > Koy sees xX) of this 


i 
automaton for which the word (4) is defined and ends with the letter 


Yas 

Let us term the set Ry thus defined an event, represented in the 
(partial) automaton A by the output signal Ys ae te erie 0 Pee ef 
M is any set of output signals, then we shall term the union of events 
represented by all elements of this set an event, represented in the 
partial automaton A by the set M. 

It is easy to see that the sets Ry are disjoint and that the set 
S of all words in the alphabet x which do not occur in even one of the 
sets R, (1 = 1, 2, ..., m) consists of all words forbidden for the 
given partial automaton. Here and herafter we use the term forbidden 
for all words in the input alphabet which when applied to the input of 
the given partial automaton lead for at least one component of their 
input signal to an output signal which is not defined in the automaton. 
We agree to call the ensemble of all forbidden words S the forbidden 
domain of the given partial automaton A. We agree also to term any 
set of words in the alphabet X an event in this alphabet. 
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From the definitions introduced, we can formulate the result ob- 
tained above in the form of the following proposition. 

Theorem 1. Specification of the partial automaton mapping 9, re- 
alizable by the partial automaton A with the input alphabet X * 
= (x); Xoy cess x.) and with the output alphabet Y = (yy> Yoo sees va? 
uniquely determines the partition of the set F of all words in the 
alphabet x into m + 1 disjoint events in the alphabet X, and namely in- 
to the events Ry» Ro» ea eu RW? represented in the automaton A by the 
output signals Vy Yor +++ Vpps and determines the forbidden donain 5 
of the given (partial) automaton A. 

And conversely: knowing the events R,> Ro» eres Ra represented 
in some partial A by the output signals Yy2 Yoo «++» Vy we can uniquely 
recover the partial mapping » between the words of the input alphabet 
x and the output alphabet Y realized by this automaton, without using 
the switching and output functions of the automaton. 

Let there be given the arbitrary word 2g = ee vee Xy in the 


n 


alphabet X. For each k(1 < k <n) we find the output signal y, usin 
. mK 
the rule: V5 is the output signal representing in the automaon A the 
ai | 
event R, which contains the initial segment », x Gitar & of length 
Jy i, 4, iy, 
k of the word #. If forall k=1, 2, ..., n there exist the corre- 


sponding y then we set o(%) = 0(K, X, «2. Ky ) @ Va Va oo Vas 
J” eyo72 ty Sy de Jy 


In the case where an output signal k = 1, 2, ..., n with the required 


properties does not exist for even one y, , we assume that the partial 


mapping » is not determinate on the eee 

It is not difficult to see that as a result of the definition of 
events represented in an automaton, the partial mapping g introduced 
in this fashion will then be precisely that partial mapping which is 
induced by the given partial automaton A. 

On the basis of this discussion we can formulate the following 
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proposition. 

Theorem 2. The specification of the partial automaton mapping 9 
between words in the alphabets X and x = (y4> Yoo sees 7) is equiva- 
lent to the specification of the events Ry» Ro» soekirs Ro represented by 
the output signals Vy Yoo s++s Vy in the partial automaton A which in- 
duces the mapping 9. 

Theorem 2 lays the foundation for the study of the automaton map- 
pings (in particular the automaton algorithms). For the description 
of such mappings it is sufficient to specify the partition of the set 
of ali words of the input alphabet into a finite number of disjoint 
events. In order that the corresponding descriptions be of a construc- 
tive nature, it is necessary to limit ourselves to the consideration 
of only those events which admit effective description. 

It is natural that first of all the finite events, i.e., events 
consisting of a finite number of words, admit simple constructive de- 
scription. They can be described with the listing of the elements ap- 
pearing in them. For the characterization of some important classes 
of infinite events, it is advisable to introduce several operations 
on the set of events, thus transforming this set into an algebra - the 
algebra of events. 

For our purposes the most conventent is the system of three opera- 
tions which is a modification of the operations first introduced by 
Kleene [40] (see also Copi, Elgot, Wright [45] and Glushkov [21]). 

The first operation is that of the set-theoretic union of events. 
We shall designate this operation by the symbol Vand term it event 
dis junction, 

The second operation is that of event multiplication, which is 
not to be confused with the operation of set-theoretic intersection. 


If the event S consists of the words Ly (eM) and the event R con- 
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sists of the words qB(B € N), then product of the events S and R is 
the name given to the event consisting of all possible words of the 
form faqB (a € M, Be N). The operation of event multiplication is non- 
commutative: generally speaking the events SR and RS are different. 
The third operation is that of the so-called event iteration, for 
which we shali use the braces as the designation, so that {S} denotes 
the iteration of the event S. The iteration of any event S is defined 
as the union of an empty word, the event S= st the event 8-5 = 3° the 
event SeSS = s3 and so on to infinity. In other words, if the event 8 
consists of the words fa(ae M), then its iteration {S} consists of all 
possible words having the form La, La, jars £0, where A)» Ans vers 


one> Go ee Mand nme O05 15: 25 35) sas: 


n 

We shall term the braces used for the designation of iteration 
iteration brackets. For the designation of the order of operations we 
shall make use of round brackets, which we term conventional brackets. 
In the absence of brackets, used to alter the usual order of opera~ 
tions, iteration is to be performed first, then multiplication, and 
finally disjunction. 

We agree to designate the single-element events, i.e., events con- 
sisting of a single word, by the symbol of this word. If X = (x, > 
Koy sees Xn)» then the m + 1 single-element events x}, X5, .++5 X,2€ 
are termed the elementary events in this alphabet. 

Here and in the future we shall use the letter e to denote an 
empty word, consisting of an empty set of letters and consequently 
having zero length. This word will play only an auxiliary, service 
role. We agree, in particular, not to consider evnets which differ 
from one another only by an empty word as different. Thus, the empty 
word can, as desired, either be joined to or removed from any event in 


question. This is associated with the fact that as a result of the de- 
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finitions which we have adopted the empty word cannot be represented 
in the automaton. 

We shall now introduce a concept which is central to all the 
subsequent considerations. 

Any event which can be obtained from the elementary events Xy> 
X59 +++» X» © in the finite alphabet X = (x,, Xone sees xn) with the 
aid of the application of a finite number of operations of disjunction, 
multiplication and iteration is termed a regular event in this alpha- 

bet. 

This definition goes back to the definition of the regular event 
given earlier by Kleene [40] altheugh it differs considerably in form 
(see Glushkov [21]). We note that the same event can be represented 
differently in terms of the elementary events. In the future we 


shall term each such representation (formula of event algebra) a reg- 


ular expression. 
One of the primary problems in event algebra is the establishment 


of the laws of the equivalent transformations of the regular expres- 
sions, i.e., those transformations which do not change the events re- 
presented by these expressions (with an accuracy to the empty letter 
e). 

Among the laws which are very frequently utilized in the equiva— 
lent transformations in event algebra are the laws of associativity 
for disjunction and multiplication, the commutativity law for disjunc- 
tion, the left and right distributive laws for multiplication with re- 
spect to disjunction ( S(RVQ) =SRVSQ, (RvQ)S = (RSVQS) and others). 

The laws of Gistributivity make possible, in particular, the re- 
moval of brackets and the bringing of common factors outside of the 
brackets (as in conventional algebra). Here we need only recall that 
multiplication in event algebra is generally speaking, not commutative. 
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Any word can be represented as the product of elementary events = 
the individual letters constituting this word. Any finite event is re- 
presented in the form of the dis Junction of the words composing it. 
This implies, in particular, that all finite events are regular. 

The use of iteration leads to the construction of infinite rege 
ular events. At the same time is is not difficult to construct simple 
examples of nonregular infinite events. For this it is sufficient to 
select such an increasing sequence of whole numbers Ny» Nos «ees 


-+e, that the differences n 


n = n, (1 = 1, 2, ...) are not bounded 


il itl 
in the aggregate (this condition is satisfied, for instance, by the 
sequence of squares of the numbers of the natural series), and in any 
input alphabet Xx construct the event S consisting of all words in the 
alphabet x having lengths equal to Ny» No and so on, 

The event S constructed in this way is of necessity nonrerular. 
Actually, assuming the opposite, we would be able to find for © some 
regular expression R. Since the event S is infinite, this expression 
contains at least one set of iteration brackets enclosing an expres- 
sion differing from the empty word e. Let us replace all the remaining 
iteration brackets in the expression R by an empty word, and the iden- 
tified brackets by the expression {p} where p is an arbitrary non- 
empty word from the event enclosed in the identified brackets. As a 
result we obtain the regular expression Ry for some event contained 
in the event 8S. 

From the expression Ry it follows directly that in the event 6 
there appear words of the form rs, rps, rpps, rppps, ..., whose lengths 
constitute an infinite increasing arithmetic progression. But this 
contradicts the method of construction of the event S. Consequently, 


the event cannot be represented by any regular expression, i.e., it Is 


a nonregular event. 


a 


oe Wee 


Let us define also the concept of the cyclic depth of regular 
expression, meaning by this the maximal number of pairs of iteration 
brackets embedded in one another which are contained in this expres-— 
sion. For example, the expression {x{y} {x}} has a cyclic depth of 2, 
while the expression (xVyjx{y} has a cyclic depth of 1. By the cyclic 
depth of a regular event we shall understand the minimal cyclic depth 
of the regular expressions representing it. 


Regular events have particular importance for the abstract theory 


of automata, since the class of regular events coincides with the 
class of events representable in finite automata. In the following 


sections we shall prove this important proposition; here we shall con- 
sider the question on the relationship of the classes of events re— 
presentable in the Mealy and Moore automata. 

The general definition of the representation of events in an 
automaton given in the beginning of the present section related to the 
Mealy automaton, Since the Moore automaton is a particular case of the 
Mealy automata, this definition is applicable in full measure to it as 
well. However, in practice it is convenient fcr the Moore automata to 
represent the events not by the property of the output at the instant 
of the application of the last input signal of the words comprising 
the events, but by the property of the state of the automaton after 
the arrival at the input of the automaton of a word of a particular 
event. 

In other words, it is customary to consider that in the case of 
the Moore automata the events are some sets of the automaton states. 
On the strength of the definition of the Moore automata, this method 
of representing the events is completely equivalent to the method of 
representing the events by the sets of the output signals. The dif- 
ference lies only in that with the representation of the events by the 
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sets of the automaton states the empty word e is representable (with 
the aid of the initial state), while e€ cannot be represented by any 
output signal (if, of course, we do not initiate the time reckoning 
from negative instants of time). 

However, we have agreed above not to consider events differing 
from one another only by the empty word e as different. Therefore the 
two methods of representation of events (states or output signals) 
in the case of the Moore automata are actually equivalent. 

Since the Moore automata can be considered in the abstract theory 
as a particular case of the Mealy automata, it seems natural that the 
class of events represented in the Moore automata is more scanty than 
the class of events represented in the Mealy automata. In reality this 
is not so. 

Let us assume that some event S is represented in some Mealy 
automaton A by the set M of its output signals. It is not difficult 
to see that the event S can be represented by some set of internal 
states of the Moore automaton B (inducing the same mapping 9 as the 
automaton A) which was constructed in the preceding section. 

We recall that the states of the automaton B are all possible 
pairs (a, x), composed from the states a of the automaton A and the 
letters x of its input alphabet X and also the initial state a, of 
the automaton A. The shifted output function up of the automaton B on 
the initial state ao is determined arbitrarily, while on the state 
b = (a, x) it is determined with the aid of the relation p(b) = 
= (a, x) where A(a, x) is the output function of automaton A. 

If h= BX, is an arbitrary nonempty input word, then in the 
automaton A the last letter of the corresponding output word will 
obviously be y = A(agg, x,). The automaton B, as it is not difficult 
to see, will be converted by the word h from the initial state a, into 
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the state (apg, x4). 

Thus, all the nonempty words of the original event will be re- 
presented in automaton B by the set K of all possible states (a,, x; ) 
for which the relation d(a,; x 5) e M is valid. 

§3. ANALYSIS OF FINITE AUTOMATA 

The analysis problem amounts to the determination of the events 
represented in the automaton by sets of output signals (in the case of 
the Mealy automata) or by sets of the internal states (in the case of 
the Moore automata). Since every Moore automaton can be interpreted 
as a Mealy automaton, it is sufficient to learn to analyze only the 
Mealy automata. 

We shall resolve the analysis problem only for the case of finite 
Mealy automata. All events represented in such automata are necessarily 
regular. The analysis algorithm is applied to the switching and output 
tables of the automaton being analyzed and as the final information 
gives the regular expressions for the events representable by each of 
the output signals of the automaton. An event which is representable 
by an arbitrary set of output signals is written, then, as the dis- 
junction of events represented by the individual output signals com- 
posing the given set. 

Let us consider the arbitrary finite Mealy automaton A with the 
set of internal states (a, Ans sees a.) with the input alphabet X = 
m (x1, Kor sees x) and the output alphabet Y = (yy; Yor sees Ym) 

Considering specified the initial state ay» the switching func- 
tion d(a,, x;) and the output function A(a,, x) of the automaton A, 
we shall look for the regular expression R for the event represented 
by some output signal, say the signal a We write out the internal 
states of the automaton ay» ae ein ay =a into which the automaton A 
transfers from the initial state @, by means of the sequential initial 
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segments €, X, , X Sg ee bg ee ya are of some input word q. 
ioe 2 rt Ly 
Inserting the symbols for the states obtained into the word q after 


the corresponding initial segments, we transform this word into the 


new word q' = a.x, a, x ee. A Xx, @, , which we agree to call the 
Pet ae Jnr tk Je 


path corresponding to the word g. Separating in the given path the 
symbols of the internal states as, we obtain the input word corre- 
sponding to the given path. 

We shall also use the so-called curtailed paths, obtained from 
the conventional paths by the dropping of the extreme right symbol 
of the internal state ee We designate the path corresponding to the 
given input word gq by g' and the curtailed path by <q". 

It is evident that for the nonempty input word g to belong to the 
event R,> representable in the automaton A by the output signal Vy> 
it is necessary and sufficient that the curtailed path q' correspond- 
ing to the word gq terminate with the pair Sires ow for which the out- 
put function takes the value equal to Ys: We term all such (curtail- 
ed) paths of the type yy, or, generalizing (for any i), representa- 
tive type paths. 

Paths (uncurtailed) corresponding to the input words which trans-— 
form the automaton from some state a, into the same state as are 


termed type 25 paths, or cyclic type paths. If in some path q' of the 


cyclic type a, there are no symbols of any internal states as ays 
1 a 
then we shall also term the path g' a path of the a slay, : 
r 1 
woe veey Ay ] type (here the symbols in the square brackets are 
P 


«3 ay 


termed forbidden). 
The path q' of arbitrary type is termed simple if the curtailed 
path g" corresponding to it does not contain two identical symbols of 


the internal states. Only a finite number of different simple paths 
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exists in a finite automaton. All simple paths of any given type can 
be found directly from the switching table or (for paths of the re- 
presentative type) from the switching and output tables of the autom- 
aton, 

Let us construct some auxiliary events in the alphabet Z = (x,> 
Xoo sees Xpy Aye Ane coos ay) whose elements are curtailed paths in 
the given automaton A. We define the event S(y,) of type y, as an event 
consisting of all (curtailed) paths of type yy» we define the simple 
event P(y,) of type y, as the disjunction of all simple (curtailed) 
paths of type Yi We shall term the iteration of the disjunction of all 
simple paths of type t the simple event P(t) of any given cyclic type 
t= ajla, a a (J = 2, 2) 2.45 Ps F< pes 1) (the disjunc- 
tion of an empty set of paths is an impossible event whose iteration 
coincides with the empty word e) and shall term the event consisting 
of all curtailed paths of type t the event S(t) of type t. Finally, 
we term the iteration of the portion of the event S(t) containing 
words with only a single occurrence of the symbol a, the conditionally 
simple event U(t) of type t = Ayla a, 2Ne3 Pie 

Let there be given some set (curtailed) of paths of type y , Spe- 
cified with the aid of the regular expression Q. Inserting into this 
regular expression ahead of each occurrence in it of the symbol of 
the internal state a, the regular expression of the event S(a,) of 
type a, or of the event of type a aK? ees Ae = t, we obtain 
a new regular expression, representing as before only paths of type 
y,- We term this operation the embedding of the event S(a 5) in the 
event Q. 

Now let g" be an arbitrary (curtailed) path of type y,: The first 
(left) symbol of the internal state occurring in this path will be the 


symbol a,, Let us isolate also the last (extreme right) occurrence of 
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the symbol a, in the path gs q! = a, +++ 4,8, where the word s al- 
ready does not contain the symbol a,. Then the path q" can be repre- 
sented by the product of some number of words of the conditionally 
simple event U(a,) of type a, and the word as. 

In the word s we find the first (left) symbol of the internal 
state: s = <i oie --.3 after finding also the last occurrence of this 
symbol in the word s, we obtain the possibility of representing the 
word s by the product of the letter ea? some number of words of the 
conditionally simple event of type ota and some word ih where 
r does not contain the two symbols of the internal states a, and aie 

We further come to the conclusion that the path q" is contained 
in the event which is obtained as the result of the embedding of the 
conditionally simple event of type ay in some simple path of type Vy 
ahead of the first occurrence in it of the letter a,» and the embed 
ding of the conditionally simple events of type as ses Sjeane oe 
ahead of the succeeding occurrences in this path of the symbols of the 
internal states aie aie es, and so on. 

But exactly the same process, obviously, can be repeated with the 
words of the conditionally simple events which were separated from 
the original path q". After this we come to the conclusion that the 
path q" of type y, occurs in the event which is obtained as the re- 
sult of the embedding in the simple event P(y,) of type Vy not the 


conditionally simple, but the ordinary simple events of types ay» a, 
k 


[a,], ».- and the subsequent embedding in the paths constituting the 
embedded simple events of the conditionally simple events: for the 
conditionally simple event of type a, — the types ata, J], a fas, 
a], ..., for the simple event of type ale — the types a, [a,, 


ayo a pe lege uc and so on. 
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We further come to the conclusion that ~ sain in the second stage 
we can embed not the conditionally simple, tw fhe ordinary simple 
events, embedding, in turn (in the third stage) n the words consti-~ 
tuting them the conditionally simple events of stiil higher (in the 
sense of the number of forbidden letters) cyclic types. 

Increasing the number of stages of sequential embeddings, we 
finally come to the embedding of events of cyclic types in which all 
the letters except one are forbidden. Since for this kind of types 
the difference between the conditionally simple and the ordinary sin- 
ple events of identical type no longer exists, the process of increas— 
ing the number of stages and of new embeddings is thereby completed. 

As a result we come to the conclusion that the path q" occurs in 
the event Sy which is obtained as the result of a finite number of 
sequential embeddirgs (divided into a number of stages) into the sim- 
ple event of type V4 of simple events of ever higher and higher cyclic 
types. In view of the arbitratiness of the selection of the path q", 
the event Sy includes in itself the event S(y,). 

At the same time, as remarked above, the process of embeddings 
similar to the process described cannot lead to an event containing 
the paths differing from the y, type. Consequently, S, = S(y,), and 
the embedding process we have described gives a regular expression Ry 
for the event S(y,), consisting of all paths of type y. 

Dropping now in the regular expression Ry all the symbols of the 
internal states (replacing them with an empty word), we obtain the 
regular expression R which, as it is easy to see, is nothing other 
that the regular expression for the sought event, represented in the 
automaton A by the output signal yy: 

We have proved the following proposition. 

Theorem 1. An event represented in an arbitrary finite Mealy 
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automaton (and, consequently, also in an arbitrary finite Moore autom- 
aton) by any set of outyut signals is necessarily regular. There exist 
a universal constructive technique (algorithm for the analysis of fi- 
nite automata) which makes it possible to find the regular expressions 
for events represented by the sets of output signals in an arbitrary 
finite automaton. 

The described algorithm for the analysis of finite automata can 
be given a form which is more convenient for practical applications 
[22]. To do this we shall work not with the events in the set of paths, 
but with formal expressions termed complexes. 

For any set M of output signals of a given finite Mealy automaton 
A the term complex of type M (or output type complex) is given to the 
disjunction of all simple curtailed paths terminating with the palts 
a jX4 > to which there correspond the output signals contained in the 
set M. Complex of type ai tei 247 ean ay) (cyclic type complex) is 
the term for the formal expression obtained as the result of joining 
with the disjunction sign all simple paths of type aay oe Siac 

aoe alee with the letter a, stricken and enclosing the resulting for- 
mal polynomial in the iteration brackets (a,, age ees ae are any 
pairwise different internal states of the automaton and O<¢r¢p-—l 
where p is the number of internal states of the automaton A). 

First step of the analysis algorithm. From the switching and out- 
put tables of the automaton and the given (representative) set of 
output signals, by means of sorting of all possible variants of sin- 
ple paths we find the complex K(M) of type M and the complexes K(a,) 
for all the internal states as of the automaton A. 

Second step. From the complexes K(a,), by exclusion of unneces-— 
sary terms in the iteration brackets we find the complexes of higher 


cyclic types asfa +9 Ay ] (ri < 1). which are necessary for 
r 
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see ale 
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the further constructions. 
Third step. Starting from the comrlex of type M, we sequentially 

replace all symbols of the internal states as by complexes of cyclic 
type until we obtain an expression R not containing a single one of 
the internal state symbols. fhe replacement rule can be formulated as: 

18—~ X, 2@ -»- occurs in the complex of the type 
1, Jy 49 dp : 
M output, then the letter ay is replaced by the complex of type ay» 


If the path a,x 


the letter as is replaced by a complex of the type a, [a,], the 
1 1 
letter a, by a complex of the type a, [a,, a ] and so on. If the 
Jo Jo Jy 


term x, @, X, a, ... occurs in the complex of the type a, [N], where 
4) 41 19 Jp : 
N is the set (possible empty) of internal states differing from as» 
then letter a, is replaced by a complex of the type a; [a> N], the 
1 ni 


letter a; by the complex of the type a, [a,; ass N] and so on. 
2 2 : 


In the third step, as a result of the application of the replace- 
ment rule a finite number of times, we obtain the desired regular ex- 
pression R for the event represented in the automaton A by the set M 
of output signals. 

The following proposition follows directly from the described 
algorithm. 

Theorem 2. Every event represented in a finite Mealy automaton 
(or Moore automaton) having n internal states admits a regular ex- 
pression whose cyclic depth not does exceed n. 

As an example let us find the reguiar expression for the event S 
represented by the output signal v in the automaton whose switching 
and output tables were described in §1 of the present chapter (its 
graph is shown in Fig. 7). 

We find the complex K(v) of type v directly from the tables 

K (v) = 19 V Wer V xerox 


Soo te 








| Fes oe 
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and the complexes of types 1, 2 and 3 
K (1) =e, K (2) = {y V xay}, K (3) = (x V yx}. 


We write out some complexes of the higher cyclic types: 


1K (2{1}) = K (2); K(3f1, 2]) = {x}: 
K (2(3]) = K (2{1, 3)) = (yi: K (311) = K (3). 


Designating the operation of embedding of complexes by an ar- 


row, we obtain the following sequence of embeddings: 


K (ve) + y V yK (3(1]) x V xK (2(1]) xK (3[1, 2)) x = 

=YVyl(xV yxixVxlyV xylxixixory VyisV 

V yK (2(1, 3) xix V ety V xK (31, 2)) yg} x (x) x = 
=yVy{(xV yy x) xVxy VX [xpyi x {x} x. 

The last of the regular expressions obtained is then the sought 
regular expression for the event S. It admits transformation and sin- 
plification with the use of the relations existing in event algebra. 
§4. ABSTRACT SYNTHESIS OF FINITE AUTOMATA 

The abstract synthesis problem is the opposite of that of the 
analysis of finite automata: it 1s necessary to find an effective 
method which will make it possible to find from the regular expressions 
for the events the switching and output tables of some finite autom- 
aton which represents these events. 

The problem of the synthesis of Moore automata is more general 
than that of the synthesis of the Mealy automata: since every Moore 
autonaton can be interpreted as a Mealy automaton, by learning to 
synthesize the Moore automaton we also learn to synthesize the Mealy 
automaton as well. Therefore we shall solve the problem of synthesis 
of the Moore automaton. 

Let there be given in some finite alphabet X = (x, > Xo sees x) 
the p regular expressions Ry» Ro: Seen Ro Let us number all occur- 
rences of the letters of the alphabet X in the expressions Ry > Ro» rer 


oe Ry by the sequential natural numbers, which hereafter we shall 
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term the subscripts of the corresponding places of these expressions. 


We emphasize particularly that the various occurrences of the same 
letter of the alphabet X will thus have different subscripts. 

In the development of the regular expression into a word, each 
of the sequentially written out letters of this word is identified 
with a particular occurrence of the corresponding letter in the ex- 
pression being developed. We agree to consider that in this identifi- 
cation we enter particular places of the regular expression, namely: 
in the identification of the last written letter with the occurrence 
numbered with the subscript j, we shall consider that we are in the 
j-th place of the corresponding regular expression. We say that the 
J-th place of the regular expression x, ~follows after the i-th place 
if after identification of the last letter of some word q with the 
occurrence having the subscript i we can identify the last letter of 
the word aX, with the occurrence having the subscript Jj. In each reg- 
ular expression there is also identified the initial place, to which 
there is assigned the subscript O (identical for all given regular ex- 
pressions). If, in the process of identification, the first letter xX, 
of some word is identified with some occurrence of it in the regular 
expression, having the subscript j, then we consider that the j-th 
place x,-follows after the zero (initial) place (common for all given 
regular expressions). 

Finally, if the membership of some word p to the event with the 
regular expression R is established as the result of the identifica- 
tion of the last letter of the word p with its occurrence in R, having 
the subscript j, then the j-th place of the expression R is termed a 
final place of this expression. 


For any finite set of regular expressions Rj; Ro» ure Ro in the 


same alphabet X, using the order of operations in the algebra of events 
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defined above, it is not difficult to compose the place sequence 
table. The rows of this table are designated by the letters of the 
alphabet X and the columns by the subscripts of all the places of the 
expressions Ry» Ry» Sens Ro At the intersection of the x,-th row with 
the j-th column of the sequence table there are written out the sub- 
scripts of all the places which x,-follow after the j-th place. If 
there are no such subscripts, in the corresponding place of the table 
we place a special symbol designating an empty set of subscripts. We 
agree to use an asterisk as this symbol. 

Let us construct the Moore automaton A whose internal states will 
be all possible subsets of place subscripts in the given regular ex- 
pressions Ry» Ry» eas R, (including the empty subset). The switching 
function 6 of this automaton is constructed as follows: for any state 
as of the automaton A (set of place subscripts of the given events) 
and for any letter Xs of the input alphabet, the state a, = O(a; xy) 
is defined as the set of subscripts of all places which x ,- follow at 
least one of the places whose subscripts occur in ay. 

The shifted output function yw of the Moore automaton A is con- 
structed for the output alphabet x consisting of all possible subsets 
(including the empty subset) of the set of all symbols Rys Ros vees Ro 
of the given regular expressions. For any state as (set of subscripts) 
of the automaton A, we select as u(a, ) the set of all those regular 
expressions Ry» Ry» area Ry for which at least one of the subscripts 
occurring in as is the subscript of « final place. 

We have constructed some finite Moore automaton A. From the meth- 
od of construction of its switching and output functions it follows 
directly that it represents (with selection of O as the initial state) 
each of the given events Ry> Ro» seers Ry and the complement 8S of their 
union. The event R, is represented by the set of all those output sip- 
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nals (sets of symbols Ry» Ros sees Ry), in whose composition there 
occurs the symbol R, (1 = 1, 2, ..., p). The event S is obviously the 
empty set of the symbols Ry> Ry» vee Ro: 

As a result we have proved the following proposition. 

Theorem 1. Any regular event can be represented in a finite auton- 
aton. There exists a single constructive technique (synthesis algo- 
rithm) which makes it possible from any finite set of regular events 
given by regular expressions to construct the finite Moore or Mealy 
automata representing these events. 

Combining the proved theorem with the result obtained in §2, we 
obtain the following result. 

Theorem 2. Regular events and only regular events are represent— 
able in finite automata. 

A similar result for automata of a special form (neural networks) 
and for a more awkward form of definition of the regular event has 
been obtained previously by Kleene [40]. 

The following proposition also follows from the results obtained 
above. 

Theorem 3. The intersection of two (and therefore of any finite 
number as well) regular events and the complement (in the set of all 
words in the basic alphabet) of any regular event are also regular 
events. 

The algorithm described above for the synthesis of finite autom- 
ata also admits the following interpretation which is more convenient 
for practical purposes [212]. 

Let there be given the p regular expressions Ry> Ro» oh eel ly Ro in 
the arbitrary finite alphabet X = (x,, Xoo sees x) If any of the ex- 
pressions R, is the disjunction of several terms, then we can without 
losing generality consider that it is enclosed in ordinary (nonitera- 
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tive) brackets. Specially introduced separation symbols (vertical bars) 
standing between any two symbols (letters, brackets, disjunction signs) 
of these expressions and also standing to the left of an expression 
(initial place) and to the right of an expression (final place) will 
be termed places in the expressions Ri> Ro» y oueiy Ro 
Places having a letter of the basic alphabet Xx standing directly 

: on their left and the initial place are termed basic places; the places 
having a letter of the alphabet X standing directly on their right are 
termed prebasic. The initial places of all the expressions Ry» Ros a bus 
wears Ry are identified with one another in one single initial place. 
We designate all the basic places with different nonnegative whole 
numbers — the basic subscripts of these places. Here the initial place 
takes the basic subscript 0. 

The operation of each basic subscript extends not only to the cor- 
responding place, but also to the places (basic and nonbasic) which 
are subordinate to it. The place subordination rule expresses the or- 
der of the operations in the algebra of events. It is defined by the 
following subscript extension rule. 

The place subscripts ahead of any brackets (iterative or conven- 
tional) extend to the initial places of all the terms standing inside 
these brackets. The subscripts of the final place of any term enclosed 
in brackets extend to the place directly following these brackets. 
Place subscripts directly preceding iterative brackets or symbols of 
an empty word extend to the place directly following these brackets 
(respectively after the given symbol e). Finally, place subscripts 
following directly after iterative brackets extend to the initial 
places of all the terms enclosed in these brackets. 

All the subscripts appearing on the basic and nonbasic places as 
the result of the application of the rule just formulated are termed 

S77 < 





nonbasic. In this case the rule itself must be applied until its ap- 
plication no longer leads to the appearance of now subscripts on any 
place. 

The indexing of the given regular expressions, the labeling of the 
places and the extension of the subscripts according to the formulated 
rule constitute the first step of the synthesis algorithm. 

The second step consists in the construction of the switching 
table of the sought automaton A. Here the input signals are the letters 
of the original alphabet X, and the internal states of the automaton 
are identified with the sets of the basic subscripts. Let us agree for 
definiteness to denote these sets by the disjunction of the component 
subscripts, and the empty set of subscripts by an asterisk. 

The rule for the construction of the automaton switching table 
amounts to the following. 

The single-element set consisting of the subscript O serves as 
the initial state of the automaton A. The state a, is transformed by 
the input signal Xy into the state ay, consisting of the basic sub- 
scripts of all the basic places, separated by the letter Xp from the 
prebasic places directly preceding them, whose subscripts (basic or 
nonbasic) contain at least one subscript from the number of subscripts 
occurring in the state ay: 

In practical application of the formulated rule it is convenient 
to separate the basic subscripts, placing them above a horizontal line 
specially drawn for this purpose. It is also advisable to separate all 
the subscripts (basic and nonbasic) of the prebasic places, for example 
enclosing them in a rectangular frame. In the construction of the 
switching table it is sutficient to limit ourselves to only the states 
which actulaly appear in the process of the construction of the table, 


starting from the initial (zero) state. 
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The third step of the synthesis consists in the construction of 
the shifted output table, or, what is the same, in the labeling of the 
states of the automaton A with the output signals corresponding to 
them. As the output signals we select the various sets of the symbols 
of the initial regular expressions (including the empty set). The state 
labeling rule consists in the following. 


The state a, is labeled with the set of those symbols of the ex- 


i 
pressions Ry> Ro» ..-, R., whose final place subscripts (basic and 


Pp 
nonbasic) include at least one subscript from as: 

The states labeled with the empty set of symbols are also termed 
unlabeled. 

We note that *:he constructed Moore automaton represents the event 
R, by the set of all those output signals which contain the symbol 
Ry (ie ee ery pk 

The fourth step of the synthesis algorithm consists in the rede- 
signation of the internal states and the output signals to obtain a 
simpler writing of the switching table and the shifted output table. 
Here the internal states are most frequently numbered with th2 se- 
quential natural numbers l, 2, ..., kK. 

Finally, the fifth step of the synthesis algorithm is used when 
we are required to synthesize a Mealy automaton rather than a Moore 
automaton. It amounts to the construction of the conventional (un= 
shifted) output table. As follows from §1 of the present chapter, for 
this it is sufficient to substitute in the switching table In place of 
the internal states the output signals which label them. 

In the solution of practical problems which arise in the synthesis 
of automata, it is frequently convenient to assign identical basic sub- 
scripts to certain basic places, thereby identifying these places. 
Such an identification is possible if to the identified places there 
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are subordinated identical sets of prebasic and final places (places 
satisfying this conditions are termed similar). 

Another case when it is possible to identify places relates to 
the so-called corresponding places. All those places in the various 
regular expressions R> Ro» ee R, or in the different terms encloscu 
in the same brackets, to which identical paths (sets of words) lead 
from the initial place or correspondingly from the place directly pre- 
ceding the brackets, are termed corresponding. 

In the use ¢” the synthesis algorithm described above the basic 
subscripts of the corresponding places always occur together in the 
states of the automaton being synthesized. It is precisely this that 
makes possible their identical indexing. Substantiation of the possi- 
bility of jJ‘entifying similar places results from the minimization 
algorithm acsc.i>ed in §5 of the present chapter. 

We note that the places shc.id be identified only with respect to 
one of the criteria (similarity or correspondence), since simultaneous 
identification with respect to both criteria can lead to errors. In 
particular, since the initial places are actually identified with re- 
spect to the correspondence criterion, we cannot, generally speaking, 
with the existence of more than one event identify the initial place 
in any event with another place using the similarity criterion. 

The validity of the following proposition results directly from 
the algorithm described. 

Theorem 4. Events given by the regular expressions Ry; Ro» Mtptals Ro 
in some finite alphabet X can be represented in a finite automaton 


: n+1 


(Meaiy or Moore) having no more than 2 internal states, where n is 


the total number of occurrences of the letters of the alphabet X in 
the expressions R,: Ro» éeoe Ro 
Let us consider what changes need to be made in the synthesis 
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algorithm when. Ln addition to the initial events R,, Ros fie ee Ry 


there is also given the forbidden region S in the alphabet X = (x,> 
Xone sees x) 

The forbidden region can be specified either with the aid of some 
regular expression, or as an ensemble of words in the alphabet X not 
occurring in even one of the events Ry» Ro» i aa Ro: These two methods 
are essentially equivalent to one another, since we can transfer from 
the first method of specification to theother and vice versa. 

The forbidden region S by its very definition permits right mul- 
tiplication by the ensemble F of all words (including the empty word) 
in the alphabet X: SF = 8. Therefore, with specification of the for- 
bidden region by the regular expression R we can, without losing gen- 
erality, assume that the expression R has the form 

R=R, (x, V x2 V--+ V X,)- 

The synthesis algorithm described above gives the solution of the 
problem with the existence of a forbidden region. However, in this 
case many transitions in ther synthesized automaton are redundant in 
the sense that they will never be used in actual operation of the 
automaton. The problem consists in the determination of all such 
switchings and the construction in place of the conventional (com- 
pletely determinate ) automaton a partial automaton in whose switching 
and output tables dashes stand in the places of the forbidden tran- 
sitions. The conversion to the partial automaton gives additional 
possibilities for subsequent simplification of the automaton. 

This problem in the case of the specification of the forbidden 
region as the ensemble of words in the alphabet Xx which do not occur 
in even one of the given regular expressions Ry» Ro» ae es Ry? is 
solved by a quite obvious method. After the performance of the synthe- 
sis algorithm described above, the output signal designated by the 
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empty set of symbols Rj» Ry» ee Ry will correspond to the appear- 
ance of the forbidden input word. Consequently, it is sufficient to 
replace this output signal in the output table by a dash and to put 

a dash in all the places of the switching table corresponding to the 
appearance of the forbidden output signal (with superpositioning ‘of 
the output table on the switching table the places labeled with a dash 
in the two tables must coincide). 

In the case of the specification of the forbidden region by the 
regular expression S, we appiy the usual synthesis algorithm to the 
expressions S, R» Ro» eeeg R, and consider as forbidden all outputs 
designated by the sets which include the symbol S. Forbidden outputs 
in the output table are replaced by dashes, which are transferred to 
the switching table using the method described above. It is clear that 
this technique actually leads to the solution of the posed problem. 

In this case we should consider that the expression has the form 

S = Se a 

If the initially given expression for the forbidden region did 

not satisfy this condition, it must be replaced by the expression 
R=S(x,Vxy,V...V%,}. 

As an example, let us consider the synthesis of the partial Mealy 
automaton representing the event R = x {y}, with the existence of the 
forbidden region S =yx(xVy}. In the first step we perform the 
labeling of the places, the indexing and the extension of the indexes 
in the expressions R and S, using the possibility of identification of 


similar places: 





[{ 
10) |2| 3 3 


vy S=|y|x =lviel 
0 3st 31 3 
i a 8 8 


Performing the second and third steps of the algorithm, we come 
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to the labeled switching table of the Moore automaton 
—R——S 
_[o12* 3 
Be dee gles 
Peles 3 


In the fourth step we introduce the redesignation 0~™1, 1™2, 
273, 4, 375, ()mu, Rov, S—w (here the brackets ( ) de- 
signate the empty set of symbols R and S). After this obtain the 
labeled switching table 


uvuuuw 


(12345 
x'24545 ° 
y.32445 , 


Completing in the fifth step the conversion to the Mealy autom- 
aton, we obtain the switching and output tables 








j 12345 )12345 
x| 24 oe xj vuwuw 
132445 yj uuvuuy 


Finally, we perform the conversion to the partial automaton. In 
the present case we shall consider the forbidden region to be the sir- 


nal w. Then the switching and output tables will have the form 


[12345 }123 4 5 


4—4—' xfuu—u—° 
3244—-— yjuvuu— 


However, state 5 is redundant, since the automaton can never con= 


vert into it starting from the initial state 1. Discarding this redun- 


dant state, we come to the final switching and output tables 








| 12 4 {12 3 4 
x[24—4°5 xjuu—ua' 
y{32 4 4 yljuvun 


§5. MINIMIZATION OF ABSTRACT AUTOMATA 


As indicated above, we consider the abstract automaton as a de- 


vice for the realization of automaton mappings. In connection with 
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this it is natural not to differentiate automata which are equivalent 
to one another, i.e., automata which induce identical mappings. 

The primary task resolved in the present section is that of the 
minimization of automata, i.e., the problem of finding the automaton 
with the minimal number of states in the class of all automata equi- 
valent to the given one. The method presented for the solution of this 
problem is a development of the ideas of Mealy [55], Aufenkamp and 
Hohn [3,4]. 

Let a and b be two states of the same or of two different Mealy 
automata having common input and output alphabets. If for any input 
signal x, the output signals determined by the pairs (a,x, ) and (b,x, ) 
are identical, then the states a and b are termed l-equivalent states. 

If l-equiv.lent states are transformed by any input signal X54 in- 
to states which also are l-equivalent to one another, then they are 
termed 2-equivalent. If 2-equivalent states are transformed by any in- 
put signal into states which are 2-equivalent to one another, then 
they are termed 3-equivalent, etc. 

It is easy to see that in the case of the application to them 
of any input word i the i-equivalent states give rise to identical 
output words i= 1, 2, ...). 

The i-equivalency reiation for any 1 = 1, 2, ... has the prop- 
erties of reflexivity, symmetricity and transitivity. This implies 
that the set of all internal states of a given Mealy automaton is 
partitioned by this relation into aisjoint classes of states which 
are l-equivalent to one another. We term such classes i-equivalent 
classes or i-classes. 

States which are i-equivalent for all 1 = 1, 2, ..., are termed 


equivalent states, and the classes defined by the equivalency ratio. 


are termed equivalent classes or »~-classes. 
= 6 = 


The validity of the following proposition follows directly from 
the definition of the states which are equivalent to one another. 

Theorem 1. Two states of the same or of two different Mealy autom- 
ata are equivalent to one another if and only if the application to 
then of any input word causes the appearance of the same output word. 

This proposition makes it possible to formulate the following re- 
sult as well. 

Theorem 2. Two Mealy automata are equivalent to one another (in 
the sense of the coincidence of the automaton representations which 
they induce) if and only if their initial states are equivalent. 

The application of the same input word p to two equivalent states 
a and b transforms them anew into the equivalent states ap and bp. 
Since equivalent states are at the same time l-equivalent, then for 
any input signal x, the pairs (a, x) and (b, x, ) define identical out- 
put signals. 

Thus, for every Mealy automaton A we can construct the new Mealy 
automaton B with the same input and output alphabets as automaton A, 
taking as the set of its internal states the set of all equivalence 
classes of the automaton A. The transitions and the outputs in autom- 
aton B are determined as follows: the equivalence class K, is trans- 
formed by the input signal Xs into the equivalence class K, contain- 
ing the state a sXys where a, is any state contained in the class K,. 
To the pair Kix, there is associated in this case the output signal 
determined by the pair A s%y- We shall term the automaton thus con- 
structed the canonical minimization of the Mealy automaton A. 

The validity of the following proposition follows from the meth- 
od of construction of automaton B and from proposition l. 

Theorem 3. In the canonical minimization of any Mealy automaton, 
any two different internal states are not equivalent to one another. 
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For the realization of automaton correspondence it is sufficient 
to limit ourselves to the consideration only of the so-called con- 
nected automata, i.e., those automata in which every state is attain- 
able, or, in other words, which as the result of the application of a 
suitable input word can be transformed from the initial state into any 
other internal state. 

Actually, if the mapping » is induced by the unconnected autom- 
aton A, then the attainable states of the automaton A form the new 
automaton B which is connected and induces the same mapping 9. We 
note that as a result of the application of the synthesis algorithm 
described in the preceding section, connected automata are always 
obtained. 

Let us consider some connected Mealy automaton A which induces 
the specified automaton mapping ». The canonical minimization B of 
the automaton A is also connected and realizes the same mapping 9 
(with selection as the initial state of the equivalence class Ky con- 
taining the initial state of the automaton A). 

Let D be any automaton realizing the same correspondence and let 
do be its initial state. On the strength of the connectedness of the 
automaton B, for its every state K, we can select the input word Py 
such that KoP4 = Ky (1 ¢€ M). Let us construct the mapping y of the set 
of states of automaton B into the set of states of automaton D, set- 
ting V(K, ) = doP, (i eoM). 

It is clear that tne initial states Ky and do of the automata B 
of the automata B and D are equivalent to one another. But then the 
states K, and d, = V(K, ) are also equivalent for any i ¢ M. If W(K, ) = 


1 


= V(K,), then this implies equivalence of the states K, and K,. As 


1 J 
= K,. Thus, the corre- 


1 J 
spondence y is one-to-one, which implies the validity of the follow- 
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the result of proposition 3 this means that K 


owerne eae) 


ing proposition. 

Theorem 4, The canonical minimization of a connected Mealy autom- 
aton which induces any given automaton mapping 9 is the automaton hav- 
ing the smallest possible number of internal states along all Mealy 
automata which induce the same mapping » (1.e., among all automata 
equivalent to the automaton A). 

This statement completely resolves the problem of the minimiza- 
tion of the Mealy automata under the condition that there exists the 
constructive technique for the construction of the equivalency classes 
for any given (connected) Mealy automaton. Such a technique has been 
suggested by Aufenkamp and Hohn [4] for the scale of finite Mealy 
automata. It is based on the following easily proved proposition. 

Theorem 5. If for some i the partition of the states of the autom- 
aton into (i + 1)-classes coincides with the partition into 1-classes, 
then it is also the partition into ~-classes. 

Actually, if any pair (ay, as) of i-equivalent states is also 
(1 + 1)-equivalent, then the states a, and a, are transformed by any 
input signal Xr into states which are i-equivalent to one another. But 
then they are transformed by this same signal also into states which 
are (1 + 1)-equivalent to one another. Consequently, the states ay and 
a, are not only (i + 1)~equivalent, but are also (i + 2)-equivalent to 
one another. 

We further find that the states ay and ay, are n-equivalent for all 
n= i, i+ 4%, 1+ 2, ..» and, consequently, are equivalent states. In 
view of the arbitrariness of the choice of the states a and ays pro- 
position 5 is proved. 

The Aufenkamp-Hohn algorithm for the construction of the cyulva- 
lence class (~-classes) is based on the sequential construction of 
j-classes for all i= 1, 2, ... . Since the partition into (i + 1)- 
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classes is a subpartition of the partition into i-classes, then in the 


case of finiteness of the automaton A, after a finite number of steps 


we obtain on the basis of theorem 5 the sought partition into »~-classes. 


The partition into l-classes is performed directly from the out- 
put table of the automaton: into the same l-class there are combined 
all the states to which there correspond identical columns in the out- 
put table. Then there is constructed the so-called l=-table, obtained 
as the result of replacement in the automaton switching table of the 
internal states by the l-classes which contain them. 

In a single 2-class there are combined all the states belonging 
to the same l-class to which there correspond identical columns in the 
l-table. Then we proceed similarly: replacing in the switching table 
the automaton states by the 2-classes which contain them, we obtain 
the 2=table. From the 2-table we find the 3-classes, combining in one 
3-class all the states of the same 2-class to which there correspond 
identical columns in the 2-table. 

Arriving after a finite number of steps at the partition into 
om-classes, we construct the canonical minimization of the original 
automaton A directly from its switching and output tables. 

As a result we have constructed the minimization algorithm for 
any finite Mealy automata. For the case of the Moore automata it is 
necessary to introduce certain changes in this algorithm, since by in- 
terpreting the Moore automaton as a Mealy automaton and minimizing it 
in accordance with the described algorithm, we construct an automaton 
which, although equivalent to the original, is possibly not now a Moore 
automaton, 

In order that the Moore automaton remain a Moore automaton during 
minimization it is evidently necessary and sufficient that identically 
labeled states of the automaton not be related to different equiva— 
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lency classe~ 

This « .cion can be satisfied most simply by introducing for 
the Moore automaton the concept of O-equivalency of the states and the 
partition of the set of states into O-classes: we shall term any Iden- 
tically labeled states of a Moore automaton O-equivalent. If two O- 
equivalent states are transformed into two O-equivalent states by any 
input signal, then they are termed l-equi.valent. 

All the further constructions (determination of the i-classes 
for 1 > 2, determination of the equivalency classes and the construc- 
tion of the canonical minimization) are performed just as in the case 
of the Mealy automata. Of course, in the case of the Moore automata 
for the construction of the canonica: minimization B we can specify 
for it not the output table, but the shifted output table, labelinr 
the states of the states of the automaton B (equivalence classes) by 
the same output signals which are used to label the states of the 
original automaton which occur in it. 

However, the theory of minimization of Moore automata in the 
form just describi:d is not fully equivalent to the corresponding the- 
ory for the case of the Mealy automata. In particular, the proposl~ 
tion analogous to theorem 1 does not extend to the Moore automata. | 

To obtain equivalence of the two theories it is necessary to 
consider as the reaction of the Moore automaton to the input word p 
not that output wore gq which is obtained as the result of the general 
definition of the automaton given in §1, but the word Y44> where y, is 
the output signal labeling that state in which the automaton was prior 
to the application of the word p. With this definition of the reaction 
of the Moore automaton to the input word for this automaton, the pro- 
positions obtained from theorems 1, 2, 3, 4 of the present section by 
replacement of all Mealy automata encountered in their formulation by 
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Moore automata will evidently be valid. Theorem 5 remains, of course, 
valid for the Moore automata with the usual definition of the output 
reactions of the automata. 

With conversion to the usual understanding of the output reac- 
tions of the automaton, only those states into which the automaton 
cannot transfer starting from the other states are found to be ina 
special situation (for the case of connected automata only the initial 
state can have this property). It is easy to verify that for the re- 
storation of parallelism in thetheories of the minimization of the 
Moore and Mealy automata, in this case it is sufficient not to label 
similar states with any output signals. This permits in the formation 
of the O-classes relating such states to any O-class and thereby in- 
creases the possibiliti2s of the minimization. 

However, the parallelism appearing here takes place not with 
minimization of the conventional everywhere-determinate automata, but 
with transfer to the more general problem of the minimization of par- 
tial automata. 

The essence of the problem of the minimization of the partial 
automata amounts to the following: given the partial automata (Moore 
or Mealy) A which induces the partial automaton on mapping » defined 
on some set M of words of the input alphabet. Required to construct 
the partial automaton (Moore or Mealy respectively) B which induces 
the partial automaton mapping coinciding on the set M with the mapping 
@ and which has the smallest number of internal states among all 
automata (Moore or Mealy) satisfying this condition. 

Since there is only a finite number of differnt partial automata 
in which the input and output alphabets are common with the given fi- 
nite partial automaton A and in which the number of states does not 
exceed the number of states of automaton A, the formulated problem is 
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algorithmically solvable. However, the existing methods for the exact 
solution of this problem are associated with extensive sortinz (see, 
for example, Ginsburg [19]) and are therefore unsuitable in practice. 

In practice, use is usually made of the following technique for 
the minimization of the partial automata: mentally filling In the 
stricken places in the switching and output tables of the given partial 
automaton A, we combine the states into k-classes and minimize using 
the same rules as in the case of conventional (everywhere determination 
of states into classes are checked, and from the resulting canonical m 
minimizations we seiect that which has the smallest numbers of states. 

This technique actually solves the problem of the construction 
of the partial automaton B with smaller number of states than the 
original automaton A, and the partial automaton mapping wW induced by 
the automaton B coincides with the partial automaton mapping go induced 
by the automaton A on the domain of definition of the mapping ¢. In 
this case the domain or definition of the mapping v¥, generally spcak- 
ing, is larger than the domain of definition of the mapping 9. 

We shall show how the described minimization technique operates, 
uging as an example the partial Mealy automaton A given by the rollow- 


ing switching and output tables: 








12 34 {L234 
£12 24; xlu—uu, 
y|—3 44 y|— uve 


Minimizing the given automaton, we obtain two initial possl- 
bilities of combining into l-classes, leading to the smallest number 
of classes, 

a, =(1,3,4.), 6; = (2)and a, = (1.2). 6, = (3.4). 


Let us consider the first possibility: the l-table is written 


(12 34 
x 6,— 6, a,. 
y|— a, @,a, 
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ee 


From the l-table we see that the class ay must be divided into 
two 2-classes: a, = (1, 3) and c, = (4); the third 2-class b, coincides 
with the l-class b, = (2); the 2=table will have the form 


bg —by Cy. 
Y\— Asls Cy 


. _|!2 3 4 
| 
From the 2=-table we see that the 3-classes coincide with the 2= 
classes, which are, consequently, the desired ~ -classes. In this 


variant the canonical minimization is represented by the table 


_'dbe Co Ob ats 
Xlbg-- Ca; Xlu—U. 
y C2 Qs Cy yluuws 


In the second variant of the minimization, the partition into 
l-classes a,=(1, 2) and b, =(3, 4) leads to the 1-table 


_ 1 2 34 
xia, — 2,b, 
Yi— b, b,b, 


and to the partition into 2-classes a, = (1, 2); by = (3); ey = (4)3. 





The 2—-table is written 
1234 


la, —Q, Cs, 
Y\— bgCe Cy 


which implies that the obtained partition into 2-classes will not 
break down further and, consequently, coincides with the partition in- 


to »~=-classes. The canonical minimization in this variant will be given 


by the tables 


_ Gz _b3 Ce _ 92 be Ce 
X Me Qe C3; xslu uu. 
Y |be Cy Cy ylu vu 


The canonical minimizations obtained are essentially different: 
the first reacts to the input word y with the output word v, while the 
second reacts with the output word u. 

§6. STRUCTURAL SYNTHESIS OF FINITE AUTOMATA 
In the structural synthesis stage we select the elementary autom- 


- 192 = 


ata from which the synthesis of the structural diagram of the given 
abstract Mealy or Moore automaton is accomplished. We shall assume 

that the elementary automata are of two kinds: elementary automata 

with memory, or memory elements (triggers, delay lines, etc.), and ele- 
mentary automata without memeory, also termed logic elements. For 
simplicity we shall limit ourselves to the case when we have available 
only one type of memory element. 

The input and output signals of both the elementary automata and 
of the automaton under sonsideration as a whole are designated (coded) 
with a finite sequence of letters of some fixed finite alphabet, termed 
the structural alphabet. Usually, as the structural alphabet we choose 
the binary alphabet, consisting of two letters: O and 1. A second 
alphabet which plays a very important role in the structural synthe- 
sis stage is the alphabet consisting of the symbols of the internal 
states of the selected memory elements. We term it the state alphabet. 
The state alphabet may not coincide with the structural alphabet, how- 
ever, in practice the binary alphabet is usually also chosen as the 
state alphabet. 

One of the primary problems which is solved in the process if the 
structural synthesis of automata is the writing out of the so-called 
canonical equations which establish the relationship of the signals 
applied to the inputs of the memory elements to the output signals of 
these elements and to the signals applied to the input of the entire 
automaton as a wiole. In order to ensure proper functioning of the 
circuit, we cannot permit direct participation of input signals, ap- 
plied to the input of the memory elements, in the formation of the out- 
put signals which through the feedback circuits would be applied at 
that same instant of time to these inputs. In other words, the memory 
elements must be Moore automata and not Mealy automata. However, the 
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complex automaton formed by these elements can, or ccurse, be either 
a Moore or Mealy automaton. 

Let us assume that the elementary automata with memory used in 
the structural synthesis are Moore automata. After making a corre- 
sponding shift of the reference of the time intervals for the output 
Signals, we shall consider that the output signal at any instant of 
time t of every memory element is determined by the internal state of 
th is element at that same instant of time. 

In order to have the possibility of synthesizing arbitrary autom- 
ata with minimal consumption of memory elements, it is advisable to 


select as such elements Moore automata havinga complete system of tran- 


sitions and a complete system of outputs, which for brevity we shall 


term complete automata. The completeness of the system of transitions 
means that for any pair of states of the automata there is an input 
signal which transforms the first element of this pair into the second. 
This requirement is equivalent to the requirement: in every column of 
the switching table there must be found all states of the given autom- 
aton. The completeness of the system of output in the case of the 
Moore automaton means that to each state of the automaton there is 
placed in correspondence its special output signal, differing from the 
output signals of the other states. Therefore, for the complete Moore 
automata it is natural to simply identify the output signals with the 
corresponding internal states of the automaton. We shall adhere to 
this method in the future. 

Let us choose as a memory element some complete Moore automaton 
B. The internal states of this automaton are denoted by Zi» Zor vees 
sees Zy (r > 2). According to the assumed condition they will also be 
the output signals. For the designation of the input signals of the 


memory element we shall use the letters Sy, Soy sees Sq: The alphabet 
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(z,; Zoe eees Zn) is nothing other than the state alphabet. We shall 
select the structural alphabet somewhat later, and for the moment we 
shall show how to find the so-called canonical equation of the autom- 
aton whose memory is composed from the elements of the selected type. 

Assume that we are given the abstract finite Mealy or Moore autom- 
aton A with the input alphabet X = (x,, Xs sees Xn) the output alpha- 
bet Y = (y4> You ves x) and the set of internal states A = (ay, 
ayer ay), specified by the switching function (a, x) and the out- 
put function A(a, x). Let the selected memory element B be given by 
the switching function v(z, s). We pose the problem of finding the 
canonical equations of the automaton A under the condition that its 
memory is constructed from several copies p(t) | pl?) | ete 4 p(k) of the 
elementary automaton B. 

For the construction of the automaton A the number k of memory 
elements must satisfy the condition rX 5 p. In this case the various 
internal states a, can be identified with the various sets of states 
of the automata p(t) | pl?) | itaad pik) | The process of such an identi- 
fication will be termed the coding of the states of the automaton A 
in the chosen state alphabet. Of course, the coding process is in es- 
sence not unique. However, we shall not go into the details of the 
question associated with the selection of a particuler coding variant, 
but shall consider that this variant is already given. 

After coding, the states of the automaton A will be designated by 
the k-dimensional vectors (20), 2 (2), ee 2 (K)) whose components 
are the various letters Z Zoe eres 2, of the state alphabet; the two- 
place output function A(a, x) of the automaton A is converted into the 
(k + 1)-place function ELON Roe z(K) x), which as before we shall 
term the output function. The equivalent of the switching function 
d(a, x) after the coding will be the system of k (k + 1)-place func- 
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tions e202), 2 (K) a ee o () (2 (2), faa z (Kk) x) of the 
transitions in the elementary memories. The function » (4) determines 
the state into which the i-th memory element must transfer at the in- 
stant of time t + 1, if at the instant of time t the automaton A were 
in the state (2 (2), g\2)..... 2(K)) and to its input there was appiied 
the signal x(i— 1, 2, ..., k3 t = O, 1, 2, ...). 

The next important step is the construction of the excitation 
functions 6* (2(2) ae z(k) x) of the memory elements (1 = 1, 2, ..., 
..e, kK). The value of each such function 6 (4) with the selected state 
(242), 2 (2) say 2 (Ic) ) of the automaton A and the input signal x is 
determined as the input signal g (4) of the i-th memory element which 
causes the transfer in the i-th memory element due to the i-th switch- 
ing function, i.e., the transfer 2 (1) ~ g(t) (264) | tals 2 (Ic) x) 

(i = 1, ..., k). The input signal g(t) can be selected by more than one 
method. Therefore the construction of the excitation functions of the 
memory element from their switching functions is not unique. 

The selection of the best method of construction of the excita- 
tion functions is associated with the problems of the followirg stage 
— the stage of combination synthesis. However, for many types of memory 
elements (delay lines, triggers with complementing input, etc.) the 
corresponding transfer is performed uniquely. For several types of 
elements we can idicate hybrid combinatorial-computationai techniques 
which permit a simpler, in comparison with the general technique de- 
scribed method of finding the excitation functions [23]. 

The excitation functions g(t) equated to the input signals g (1) 
determined by them, then give the sought canonical equations for the 
feedbacks in the automaton A. However, these function have a form 
which is still not completely satisfactory: the states of automaton A 
are coded in the iniversal (for the given type of memory elements) 
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state alphabet, which does not depend on the choice of the automaton 


A; for the designation of the input and output signals use is made of 
various alphabets, including those which depend on the selection of ' 
the automaton A. Therefore it is necessary to fix the structural alpha- 
bet B, determined usually by the coding actually selected for the in- 
put signals of the memory element, and to code with the finite se- { 
quences of the letters of this alphabet not only the input signals 
6 (3) of the memory elements, but also the input and output signals x 
and y of the entire automaton as a whole. This coding transforms the 
system of excitation functions 6 (1) found above into the new system 
of functions 
OP (2. 2M, Meee Ug) 12... RE fe 12... 
where each of the functions o(1) is the input signal (letter of the 
structural alphabet) which must be applied to the j-th input channel 
of the i-th memory element at that time when the automaton A is in the 
state (2 (2), gke) Bay z(K)) and to its input channel there are ap- 
plied the signals (letters of the structural alphabet) Ups Uns eees 
arene ees 


& 
Similarly, the output function 29 (2 2), sao 2K) x) found above 


is replaced by the system of functions ns(2), z(2) wastes (kK) 


ek, 
Uns sees up)(3 = 1, 2, ..., h), where the function ry determines the 
output signal (letter of the structural alphabet) appearing on the 
J-th output channel of the automaton A at the time when the automaton 
A is in the state (2(2) | z (2) ian 2 (k)) and to its input channels 
there are applied the signa'‘s Uys Uns vees ae 

We term the resulting function gt) and rs respectively the 
structural excitation functions and the structural output functions of 


the automaton A. 


In the case (usually encountered in practice) when both the 
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structural alphabet and the state alphabet are binary alphabets, the 
structural excitation functions and the structural output functions 
can be considered as ordinary boolean functions. The problem of the 
following stage (the stage of combinational synthesis) amounts to the 
actual construction of the found functions from the elementary logic 
functions, realized by the selected logic elements. The methods of 
solution of this problem were discussed in §4 of the preceding chapter. 

As the memory elements in the majority of the modern digital au- 
tomata, use is made of the complete Moore automata with two internal 
states. It is interseting to analyze the question of how many and which 
of the elementary automata satisfy ihese properties. Let us consider 
the case when the complete Moore automaton with the two states O and 1 
has only two input signals — x and y. From the conditions of complete-— 
ness it follows that in each column of the switching table of the au- 
tomaton there must be found both states — 0 and 1. This limitation 
leads to the existence of 4 possible switching talbes in all 


JR) . 20! _|o! 10 1 
20 0; xl0 1; xl i: x10. 
yl) 1 yl|l 0 y|0 0 ylo 1 


After transformation of the input signals, the third table coin- 
cides with the first, the second with the fourth. Thus, there are only 
two essentially different automata of the required type, given by the 


_pt _l 
& x . 
yl} 1 2nd yl o 


If we set x = 0, y = 1, then the first table gives the well known 


swithcing tables 


memory element termed the delay (by one cycle) element, and the second 
gives the equally familiar element termed the complementing trigger. 


We note that the conventional electromagnetic relay with closing con- 


tacts can be considered as a delay element. 
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With an increase of the number of input signals there appear new 


types of memory elements: trigger with separate inputs, given by the 
switching table 


the mixed trigger, given by the table 


and others. 

With the existence of only two internal states it is not useful 
to increase the number of input signals to more than four, since with 
a larger number of input signals some of them will begin to duplicate 
one another (cause identical transitions in the automaton). Therefore, 
it is not difficult to compose a catalog of all the complete essen- 
tially different Moore automata with two states (we shall consider as 
essentially different those automata whose swithcing tables do not 
convert one into the other with redesignated input signals). In ad- 
dition tc the four listed types of automata, the catalog will contain 


three more automata given by the switching tables 


ol 
00. 
1 
10 
Of course, each of the listed types of automata permits varlous 
modifications as the result of different coding of the input signals 
in the binary alphabet. Let us consider as an example the complete 
synthesis (abstract and structural to the determination of the canon- 


ical equations) of the Moore automaton A which is a sequential binary 


squarer. The automaton A operates as follows: to its input there Is 
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applied a two-digit binary whole number, place-by-place, lower places 
first. At the output of the automaton the square of this number must 
appear, a.so sequentially, beginning with the lower digits. In other 
words, the automaton A must realize the following partial mapping 9: 


0000 -- 0000 
1000 - 1000 
0100 + 0010 
1100 + 1001. 


It is not difficult to verify that the mapping 9, continued to 
the initial segment of the words, satisfies the condition of automat- 
icity. Denoting the zero signal at the input by the letter x, and at 
the output by the letter u, and correspondingly the ones signal on the 
input by y and on the output by v, we write this correspondence in the 
form 

XXXX —> uuu 
YXxx —> vuUU 


XYXX —> UUVL 
yyxx > vuNv. 


In the resulting correspondence the output signal u represents 
the event = R = xVxxVxxx Vxxxx\/ yx yxx\' yxxx\/ xy\/ xyxx\/ yy VV YY »' and the out- 
put signal v represents the event Q =xyxVyV yyxx . Words which do not 
occur in the events R and Q are forbidden. By use of the forbidden 
words we can extend these events without danger of impairing in the 
synthesized automaton its reaction to the specified words. 

Since the events R and Q are disjoint, on the basis of what has 
been said we can replace the event R by the complement Q' of the event 
Q. In this case the automaton can synthesize just the one single event 
Q; the event Q' is automatically represented by the set of all non- 
labeled states of this automaton. Keeping in mind that for the labeling 
of the states the symbols u and v will be used somehow or other, we 
denote the event Q by the letter v and its complement Q' by the letter 


u. 
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The process of abstract synthesis leads to the following marking 


of the regular expression for the event Q = v: 


fa MpUe ak 


| 
jaa d In BiG 


*|) 

14 

a | 
4 





as 0 s 
(0) 
Here the same basic indices are assigned to a pair of correspcnd= 
ing places (index 1) and to a pair of similar places (index 4). The 
labeled automaton switching table corresponding to the marking is 


written 











Since the initial state O represents only an empty word, its 
label can be considered undetermined. 


After transformation of the state (*) we obtain the table: 


The set of states of the Moore automaton given by the last table 
can be divided into two O-classes: a) = (0, 2, 3, 5, 6, 7) and bo = 
= (1, 4). Let us construct the O-table and from it determine the 1- 


classes: 
=f 1234567 
— |? bo A Ay by A % 2. 
X ‘Qn Ay Ay bo Qo Qo bo Qo 
Y 109 Ay Ay Ao Ay Ay Ay Ay 


a,~= (0), 6, (1,4). c,=(2.5,7), dy = (3,6). 
We construct the l-table and determine the 2-class: 





O1234567 a, =(0), by==(1,4), Co = (2), ds = (3,6); 
[ar ba er dh by Cr ds Cr gt 5), gy = (7). 

x'Cy Cy C, 0, Cy dy b, ¢,' 3 

y'by Cy dy Cy Cy Cy Cy Gy 


The 2-table and the 3—classes will have the form 
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(0 1234567 
__'Gy bs Cy dy Oy Ts ds Bs. pe fe al (1). om) dy = (3.6); 
x'Cs + & bs gs ds bs 2°.” (4), ls (5), gs (7). f 
y\bs Is ds gs Ss Gs Gs Gs 





Since the 3=classes coincide, as is easily verifed, with the 4- 
classes, they will also be ~-classes, so that the automaton can be con- 
structed by combination of states 3 and 6. After corresponding renum- 
bering of the states, the labeled switching table of the desired Moore 


automaton A is written 





Going to the structural synthesis, we choose as the memory element 


delay line with the switching table 


Since the synthesized automaton has 7 states, while the memory 


element has 2 states, we must select 3 memory elements (23 > 7). Let u 


us denote their internal states by the letters Za» Zos 23 and the in- 


put signals by By» 85, 83 (all these quantities can take the values 0 
and 1). We denote the states of the automaton A by the values of the 
vector (zy, Zo? 23). Let us take the so-called nautral system of coding 
of the states, in which each state is coded by writing its number in 
the binary notation system 

1001; 2=010; 3=011; 4100; 5=2101; 622110; 7=Ill. 

Let us denote the physical input signal of the entire automaton A 
by the letter c, the physical output signal by d, and let us rewrite 
the labeled switching table of this automaton in accordance with the 
chosen coding systcn (such a table is usually called the physical 
switching table of the automaton in contrast to the previously consid- 
ered abstract switching table in which no account was taken of the pe- 
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culiarities of the coding). 
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The resulting physical switching table of the automaton gives the 
states 2} = 2) (t +1), ZA = Zo(t +1), Z.2 = 23 (t + 1) of the memory 
elements at each succeeding moment of time t + 1 as a boolean function 
of their states z, = 2, (t); Zn = Z(t), Zq = 22 (t) and of the input 
signal c = c(t) of the automaton A at the present moment of time t. 
from the switching, table of the delay element we see that its ctate at 
any succeeding moment of time t + 1 coincides with the signal at its 
input at the present moment of time. Therefore we can consider that the 
written-out table gives the structural excitation functions of the 
sought automaton A 

S;=0; (21, 2. 23, c) (i=1. 2, 3). 

We write out the tables of the values which determine respectively 
the functions Sy» So» 83 immediately in the form of the Karnaurh map 
(see §4, Chapter 2) 

Using the methods developed in the preceding chapter, we find the 
minimal disjunctive normal forms for the excitation functions (input 
signals of the memory elements) of the automaton A 

Sy = 2 VV 29; So = 23 VV 29 \V/ 216 VV 2402 53 = 24C VV 22029 V 
V 223 V C23 V 2,C. 

We also determine the structural output function (which determine: 

the output signal d of the automaton A as a function of the states of 


the elements of its memory) directly from the labeled physical switch- 
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ing table of the automaton A. The Karnaugh map for this function is 





written 
NY o|! 
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From the Karnaugh map we easily find the minimal disjunctive form, 

which gives the required output function d, 
d = 212, V aaee. 

We note that all our functions were found to be determinate, not 
on all the sets, and we have extended their definitions in order to ob- 
tain representations for them which are as simple as possible. 

Let us introduce as logic elements invertors, and also two-input 
AND and OR elements. Denoting them by circles with symbols of the oper- 
ations corresponding to them ], A, Vv, and denoting the memory (delay) 
elements by squares with the letters 24s 2 and Z.3 inside, we obtain 


A which is shown in Fig. 9. This circuit includes, in addition to three 
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delay elements, 4 invertors, 5 AND elements and 7 OR elements (16 logie 


elements and 3 delay elements in all). 
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Chapter 4 
SELF-ORGANIZING SYSTEMS 
§1. CONCEPT OF SELF-ALTERATION AND SELF-ORGANIZATION IN AUTOMATA 

The concept of the algorithm and of the dicrete automaton is 
generally associated with the idea of their invariability with respect 
to time. Their corresponding alphabetic representations are pictured 
as something rigid, specified once and for all. With relation to the 
classical concept of the algorithm and to the usual understanding of 
the method of functioning of the discrete automaton this idea is to a 
certain degree justified. At the same time everyone knows that the 
most advanced of self-organizing systems are being simulated at the 
present time on the general purpose electronic computers, which are 
nothing other than discrete automata with a "rigid" structure, with 
the ald of programs, which are in essence algorithms written in some 
special form. 

The contradiction arising here is to a considerable degree only 
apparert. The truth is that the difference between the "rigid" and 
the "self-altering" information converters is quite arbitrary in the 
majority of the cases and in determined not so much by the design of 
the converter itself, as by the organization of the experiment using 
the converter. The same information converter can in some conditions 
be considered to be rigid, unchanging, while under other conditions it 
can be considered as self-altering and self-organizing. 

To make these statements more precise, let us consider any dis-— 
crete automaton A with the input alphabet x the output alphabet Y, 
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the set of internal states A and the initial state Ap: The usual im- 
pression of the nature of the functioning of the automaton implicitly 
presumes that after the application to its input of some word p in the 
alphabet x and obtaining the corresponding output word gq = E(p) in the 
Y alphabet the automaton again returns to the initial state. Thus, at 
the moment of the beginning of the application of each new input word 
the automaton is always in the same state ao: As a result of this the 
mapping induced by the automaton ¢ is rigid, unalterable in the sense 
that the result of the conversion of any input word p by the mapping 

€ depends only on the word p itself and not on the moment of time at 
which it was applied to the automaton input. 

We will term each of the possible input words of the automaton a 
question and the corresponding output word a response. In this case 
"rigidity" of the automaton amounts to the fact that to a particular 
question it always and under all conditions gives the same response. 
The automaton is thereby deprived of any capability for learning and 
improvement of its responses. 

However, the transition of the automaton into the initial state 
described above prior to each new question is not at all mandatory. 
Moreover, it is not specified directly in the definition of the func- 
tioning of the automaton which was given in §6 of Chapter 3. It is 
natural to define the functioning of the automaton so that the begin- 
ning of each succeeding question finds the automaton in the state in 
which it was after the termination of the answer to the preceding 
question. With this definition, the automaton which we previously con- 
sidered to be rigid, unalterable will, generally speaking, change its 
responses in the course of time and can, in particular, be self-learn- 
ing, self-—improving, etc. 

Let us now define more precisely the method of functioning of 
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the discrete automaton with application to the theory of self-organiz- 
ing systems. Extremely important concepts defining the method of func- 
tioning of the automaton are the concepts of the cycle and informa- 
tion cycling. 

Assume that to the automaton input there is applied some (finit. 


or infinite) sequence of letters: x, x, ... x, . To it there corre- 
i, 1, in 


sponds some sequence of letters v5 vy nie té vy; at the automaton output. 
Lie n 


Let us assume that we have identified some increasing sequence k, l, 


1, ... of moments of discrete time (1<k< 4<...). Then each pair 


ef words: (x38. cas Xe Va ¥e oe ve 4 & x mee y 
1, 45 1,’ “Ii Jo Ji tity tite tin Sia 
Teas ‘aga V5! -.. will be termed a cycle, and the operation itself 


of the identification of the cycles will be termed the information 
(input and output) cycling operation for the automaton in question. 

In the future we will assume that for each automaton under con—- 
sideration there is identified a particular class of admissible se- 
quences of input letters and that each such sequence (together with 
the corresponding sequence of output letters) is partitioned into 
cycles. The cycling operation is thus defined, generally speaking, not 
on some one pair of sequences, but on all pairs admissible sequences. 

In the abstract approach to the concept of the cycle and cycling 
there is no additional meaining involved other than what has already 
been defined. However, in practice cycling always presumes that the 
pair of words (input and output) composing each such abstract cycle 1s 
in some sense a complete real cycle of functioning of the automaton, 
which can be considered separately from the remaining cycles. There 
are two cases which are encountered most frequently: the case when 
the first word of the pair (input word) is a question posed to the 
automaton, and the second word of the pair is the response to this 
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question, and the case when the second word of the pair is, as before, 
the response, but the first, in addition to the question, includes in 
itself an evaluation of the given response as well. Of course, in both 
cases it is assumed that empty letters which may occur in either the 
first or second words need not be taken into consideration. 

The situation which arises in these two cases is shown in Figs. 
10 and 11 respectively. 
We note that in the general case it is 


frequently advisable in the design of automata 





Lai : 
Fig. 10. 1) cycle; to provide for partial (and sometimes even 


2) question; 3) re- 


sponse. complete) overlapping of the response and ques- 


tion (begin the response before the question 
is terminated). This situation is reflected in Figs. 10 and 11. In 
performing the cycling operation the boundaries of the cycles are also 
; determined basically by two methods. We can, 


first, simply fix some natural number k and 





Fig. 11. 1) cycle; require that the input *rd output word in each 


2) question; 3) re- cycle contain exactly k letters (including 
sponse; 4) valua- = 
tion. empty letters as well), We will term this k- 


cycling. Second, we can define the boundaries of the cycles by fixing 
for this purpose a special letter or word, termed a label. For the 
separation of the cycles it is most convenient to place such a label 
at the beginning of each successive question (here we will consider 
that the label is a part of the question). We will agree to call this 
method of cycling label cycling. It is obvious that the combination 
of letters fixed as labels must be used exclusively for this purpose. 
A label can also be used within a cycle (for example, for indicating 
the beginning of an evaluation), but this label must be different from 
the label which indicates the boundary of the cycle. In the design of 
= 9209.-— 
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the automaton it is frequently convenient to provide for the automaton 
to put out a special label at the end of each response. We will con-= 
sider that in the operation of the automaton there are encountered 
only admissible sequences of input signals, and that for each such se- 
quence the corresponding partition into cycles has been performed. 

The ordered sequence of cycles preceding the given cycle C ina 
particular fixed admissible sequence of the automaton A is termed the 
learning history of the automaton A for the cycle C. 

It is natural to term an automaton self-improving or self-learn- 
ing if in the course of the lengthening of the learning history it im- 
proves its responses. This definition, of course, in no way lays pre- 
tense to exactness and must be considered to be preliminary. The de- 
finitions for the concept of the improvement (self-learning) of the 
automata will be made more precise in one of the following sections 
after a preliminary consideration of the probability-—theoretic concepts 


which are necessary to such definitions. However, it is useful to men- 


' tion here the directions of this further definition. First of all it 


is necessary to refine the concept of the quality of the response with 
the aid of the introduction of some numerical evaluation of the re- 
sponse. Under this condition we can put exact meaning into the concept 
of improvement of the quality of the response which was used above in 
the definition of the self-improvement of the automata. 

Further, we must keep in mind that even the automata with the 
most clearly marked tendency to self-improvement do not necessarily 
improve their responses absolutely to all questions. Here we must con- 
sider the improvement of the quality of the resposnes on the average. 
The same is true of the learning history. Some relatively in frequent- 
ly encountered learning histories can obviously lead to deterioration 
of the average quality of the responses, however if the remaining 
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learning history leads to a sharp improvement of the response quality 
the automaton as a whole can be considered self-improving. 

Finally, it is obvious that we must differentiate self-improve- 
ment which is prespecified ahead of time by the automaton designer 
(regardless of the form of the learning history) and the really self- 
triggered self-improvement which is determined by the learning history 
which actually takes place and which therefore is not planned ahead of 
time. It is clear that only the second type of self-improvement is 
actually deserving of this name. As for the first type, in this case 
the designer actually places the correct responses in the automaton 
ahead of time, but in order to simulate the process of self-improve- 
ment he forces the automaton keep this information under judgement for 
a certain time. As a result the automaton at first gives responses of 
poor quality and only at the end of some period (some number of cycles) 
does it begin, using the information which has been stored in it, to 
give correct answers. However, we can hardly term this sort of inprove— 
ment of the quality of the automaton responses with time self-improve- 
ment. 

All that we have said gives an idea of the difficulties which 
must be overcome in the exact definition of the concept of self-im- 
provement. In a similar situation is the concept of self-organization, 
which it seems to us is somewhat more general than the concept of 
self-improvement. With self-improvement there must of necessity be hbn- 
provement of the quality of the responses. With self-organization the 
quality of the responses may not be determined at all. It is only nec- 
essary that in the course of learning the automaton on the average in- 
creases the definiteness of these responses. The corresponding retine- 
ment of the definition will be given after the introduction of the nec- 


essary probability—theoretic concepts. 
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In the refinement of the concepts of self~organization and self- 
improvement it is convenient to make use of the so-called cyclic reduc- 
tion of the automata. Cyclic reduction is defined for the automata in 
which the set of admissible input sequences is fixed and the cycling 
of the input and output information has been performed. With satisfac- 
tion of these conditions, for any automaton A the input and output 
alphabets can be replaced as follows: the letters of the new input 
alphabet x are considered to be all the different input words of all 
cycles in all the admissible sequences, the letters of the new output 
alphabet xy! are similarly considered to be a11 the different output 
words of the indicated cycles. 

For any state a of the automaton A and any letter x' of the alpha- 
bet X! (input word of some cycie), we use 6'(a, x') to denote the state 
into which the automaton transitions from the state 4 under the action 
of the input word x'. We use A'(a, x') to denote the output word de- 
livered by the automaton A under the action of the input word x' in 
the case when the state a is taken as the initial state. Any admissi- 
ble input sequence of the automaton A can be cunsidered ad the sequence 
x'(1)x'(2) ... of letters of the new input alphabet X', Let us consid- 
er the set A' of all those states of the automaton A into which it can 
be switched from the initial state ay by the input words of the form 
x'(1) x'(2) ... x"(k) (k > 0), 1.e., by all possible initial segments 
of the various admissible input sequences. The initial state ao itself 
of necessity occurs in this set. 

Now it is not difficult to construct the automaton A' in which 
the set of internal states is the set A', the input alphabet coincides 
with the set x and the output alphabet coincides with the set Y'. The 
switching and output functions of this automaton wil. be the functions 
6' and A' defined above, and the initial state will be the state Qo. 
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It is assumed that only admissible input sequences (rewritten in the 
alphabet X') will be applied to the input of the constructed autom= 
aton. In this case, as is easily verified, the definition of the au- 
tomaton A is completely correct: there are sufficient states and out- 
put letters to completely describe the functioning of the automaton 
under the influence of any admissible input sequence. 

We agree to term the automaton A' thus constructed the cyclic re- 
duction of the origina? automaton A. Obviously the information cycl- 
ing will be a i-cycling in the automaton A', In other words, in the 
cyclic reduction of any automaton both the questions and the responses 
are single-lettered. 

With cyclic reduction of automata the number of their internal 
states can only diminish or, at the least, remain unchanged. The nun- 
ber of letters of the input and output alphabets will, generally speak~ 
ing, increase. It is clear that with k-cycling of the original infor- 
mation cyclic reduction cannot cause a transition from finite alphabets 
to infinite. However, in the case of label cycling such a transition 
is completely possible — after cyclic reduction a finite input or 
output alphabet may be transformed into an infinite one. 

Let us consider as an example the cyclic reduction of the autom= 
aton A with the three states 1, 2, 3, the two input letters x, y and 
the two output letters u, v whose switching and output functions are 


given by the respective tables 


2a p23 
x%1222: xuvu: 
y|3 21 yluuv 


Assuming all the input sequences admissible and taking the state 
1 as the initial state, as a result of the cyclic reduction we arrive 
at the automaton A' with two states, four input letters X) = XX, Xp = 
= XY, X23 = yx, Xy = yy, four output letters % = uu, 2 = uv, ty = WW, w= Ow. 


Sela = 


The switching and output functions of this automaton are given by 


she respective tables 


2 _jl 2 
fee. te 
nig fe 5 

With the use of the cyclic reduction a very graphic solution is 
found of the question of whether the automaton being considered is 
rigid or self-altering with respect to the given cyclization. Actually, 
we formulate the following proposition. 

In order that the discrete automaton A with given cyclization be 
rigid (1.e., it does not alter its responses to the same question in 
the course of time) it is necessary and sufficient that after cyclic 
reduction the output function A'(a, x) not depend on the states of the 
reduced automaton A’, 

Independence of the output function on the states of the autom- 
aton means, obviously, eauivalence between all the elements of each 
row of the output table of the automaton (elements standing in dif- 
ferent rows can, of course, be different). 

With application to the example considered above, the proposition 
just formulated immediately discloses the self-variability of the au- 
tomaton A for the case of e@=cycling of its input information. 

It is easy to see that the automaton in which the output function 
does not depend on the states can be replaced by its equivalent (i.e., 
inducing the same alphabetic mapping) automaton having one single in- 
ternal state. The automaton with a single internal state is in essence 
an automaton without memory. In the abstract theory of automata it is 
shown that every discrete automaton A can be minimized, i.e., in other 
words, can be replaced by its equivalent automaton B (absolute minimi- 
zation of the automaton A) having the smallest number of states among 
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all the automata which induce the same alphabetic mapping as does au- 
tomaton A, 

If after cyclic reduction of any discrete automaton A (with given 
cyclization of the information) we then perform an absolute minimiza- 
tion, we obtain an operation which we shall term complete cyclic re- 
duction of the considered automaton A (for the given cyclization). The 
validity of the following proposition resutls from the above consid- 
erations. 

In order that the discrete automaton A with given information 
cycling have the property of time independence of its elements, it Is 
necessary and sufficient that as the result of complete cyclic reduc- 
tion of the automaton A we obtain an automaton wivhout memory. 

The converse is also true: the existence of a nontrivial memory 
in the automaton obtained as the result of complete cyclic reduction 
of the considered automaton A means that the automaton A is (relativ: 
to the given cyclization) self-~adaptive. 

The relativity of the property of self-adaptability of automata 
(its dependence on the method of cycling the input information) is 
easily illustrated by the example of the automaton C with two states, 


given by the switching and output tables 


lee 
xi2 I’ x[uv’ 
y/12 ylu 


With any admissible input sequences in the case of e-cycling of 
the input information, the result of cyclic reduction of the automaton 
C will be an automaton with a single state. Thus, even without minini- 
zation we obtain an automaton without memory and, in view of the cri- 
terion formulated above, we arrive at the conclusion on the rigidity, 
invariability of the automaton C. At the same time, with l-cycling of 
the input information the automaton C must be, obviously, considered 
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to be self-adaptive. In practical problems we can quite easily dif- 
ferentiate the rigid and self-adaptive automata simply because the 
cyclization of the input information is prespecified. 

We note that in the criteria considered and in the examples dis- 
cussed on the basis of these criteria we spoke not of self organiza- 
tion or self-improvement, but only of self-adaptivity of the automata. 
The analysis of examples of self-organization and self-improvement will 
be made in the later sections after creation of the corresponding 
mathematical basis and the introduction of precise definitions. 

In the remainder of the present section we shall consider one 
terminological question. That is the usage of the terms "system" or 
"automaton" in combination with the concepts of self-adaptation, self- 
organization, self-improvement and self-learning. As we see from the 
discussions already presented, all these concepts can be developed for 
the discrete automata. However, with this approach to the matter we 
essentially lose the possibility of penetrating into the structure of 
the corresponding process (self-adaptation, self-organization, etc.). 

The study of the structure of the self-adaptation and self-organ- 
ization processes is facilitated with the representation of such pro- 
cesses not in the form of individual automata (algorithms), but in the 
form of systems of automata (algorithms). In the simplest case such a 
system consists of two automata (algorithms ). The first of these, t 
termed the operational automaton (algorithm), directly processes the 
information applied to the system input. The second automaton (algo- 
rithm), termed the controlling or learning automaton (algorithm), 
evaluates the results of the functioning of the operational automaton 
(algorithm) and introduces into it the suitable changes (in the case 
when we are considering automata, these changes are introduced into 
the switching and output functions of the operational automaton). 
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Over the controlling automaton (algorithm) of the first stage 
there can be placed the controlling automaton (algorithm) of the sec- 
ond stage, whose function is to evaluate the operation of the automaton 
(algorithm) of the first stage and introduce into it the required 
changes. By analogy we can introduce controlling automata (algorithms ) 
for the third, fourth, and any higher stages. We shall term the hier- 
archy of automata (algorithms) which arise in this fashion systems and 
shall develop the concepts of self-adaptation, self-organization and 
self-improvement for them. 

Of course, in the abstract sense, any, no matter how complex, 
system of automata is equivalent to a single automaton, however such 
reduction of the systems to individual automata leads to loss of the 
possibility of study of certain properties of such systems which are 
of practical interest, primarily the laws for the circulation of in- 
formation within the system itself. Therefore, in the future we shall 
deal not only with automata (algorithms) considered abstractly but al- 
so with systems of automata (algorithms) for whose study the internal 
structure is of particular interest, i.e., the relations between the 
individual automata (algorithms) composing the system. 

§2. SOME AUXILIARY INFORMATION. FROM PROBABILITY THEORY 

In the present section we will present certain information from 
probability theory which is needed for out further constructions. In 
view of the fact that this presentation is of an auxiliary nature, 
proofs of most of the propositions formulated will not be included. If 
needed, the reader can find the corresponding proofs in the monographs 
of Feller [81] and Kramer [48]. It is assumed that the reader is 
familiar with such elementary concepts of the theory of probability as 
the concept of the event, event probility, etc, 


The concept of the random quantity is of very essential impor- 
2217 = 


tance in the construction of the theory of the self-organizing systems. 
We shall limit ourselves to the consideration of only the random quan- 
tities which take on real values. Here, in addition to the conventional 
so-called univariate random quantities, we consider also the multi- 
variate random quantities whose values will be the finite ordered en-- 
sembles of real numbers or, what is the same, the real vectors of a 
particular (finite) dimension. 

It is also important to differentiate continuous and discrete 
random quantities. The continuous random quantity can take any values 
in a particular region (open set) of the corresponding vector space, 
for example on some interval of the real axis (including the entire 
axis as well) in the case of the univariate random quantities. How- 
ever, the totality of the possible values of the discrete random quan- 
tity can be only the discrete sets of points, i.e., those sets, each 
point of which can be inclosed in a sphere (possibly of very small 
radius) which does not contain other points of the same space. An ex- 
ample of the discrete set might be the set of all points of some Eucli- 
dean space which have integral coordinates. 

The property of randomness of the quantities we have considered 
manifests itself in the so-called trials. In each trial the considered 
random quantity takes a particular value from the domain of its defi- 
nition. The probability that the random quantity will take a particular 
value is determined by the distribution law of this random quantity. 
The distribution law of the discrete random quantity x (univariate or 
multivariate) is specified with the aid of the real function f(x) de- 
fined for all values which the given random quantity can take so that 
for any value Xy the magnitude of f(x,) is equal to the probability 
that the ra:.som quantity x will take the value X4 in the given trial. 

The domain of definition of the discrete random quantities which 
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we are considering can be either finite or denumerably infinite. It is 


evident that in both cases the normalization condition 


Yi (x) = 1, (47) 
é 


is satisfied, where the summation is assumed to extend over the entire 
region of definition of the given random quantity. 

When the random quantity x is continuous, its distribution law is 
given with the aid of the so-called probability density function p(x). 
This function is presumed defined in the region M of definition of the 
considered random quantity x and integrable in this region. With each 
successive trial the probability p(N) that the random quantity x will 
take a value from some subregion N of its region of definition is equal 
to the integral of the probability density taken with respect to this 


subregion: 
p(N) = } ets)de, (48) 


Whence follows directly the satisfaction of the normalization con- 


dition 
(e(x)dx = i (49) 
M 


Two random quantities x and y are termed mutually independent if 
when the quantity x takes a particular value there is no change of the 
distribution law of the quantity y and vice versa. Similarly the inde- 
pendence of any set of random quantities implies that when all the 
quantities occurring in this set, other than the quantity x, take any 
values there is no change of the distribution law of this latter quan- 
tity with any choice of the quantity x from the indicated set. 

Trials performed with a particular random quantity x are termed 
independent trials if the distrubution law of the quantity x remains 
unchanged in each trial and, consequently, does not depend on the val- 
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ues which the quantity x took in the previous trials. 

The domain of definition of the continuous random quantity can 
always, if need be, be extended over the entire space, assuming that 
everywhere except in the original domain of definition the probability 
density is equal to zero. 

We can also approximate the continuous distribution laws with the 
discrete laws and vice versa. In the first case it is sufficient to 
partition the domain of definition M of the corresponding continuous 
random function x into a finite number of sufficiently small (not only 
in volume, but also in diameter) subdomains M,> Mo» ats Mn? select 
within each such subdomain M, a point Xy and introduce the discrete 


distribution law f(x) on the selected points, setting f(x, ) = fox) ax 
1 


(1 = 1, 2, ..., n), where o(x) is the probability density of the orig- 
inal continuous random quantity. In this case the probabilities pre- 
viously associated with the corresponding subdomains are concentrated 
in the individual points. 

With the reverse transition from the discrete distribution law to 
the continuous, on the contrary, there is a "diffusion" of the proba- 
bility initially concentrated in the individual points X4 into the cor-= 
responding subdomains M, so that for the probability density function 


p(x) thus appearing the following relations are valid 
yee =f(x) (= 1,2,...0). 
a 


Frequently it is necessary to consider the infinite sequences of 
discrete distribution laws f, (x) (1 = 1, 2, ...), having some contin- 
yous distribution law with a probability density function (x) in the 
form of its so-called limit distribution law. The following precise 
meaning is embedded in the concept of the limit distribution. First, 
the domains of the values M, of the discrete random quantities with 
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the distributions laws f, (x) converge to the domain of the values M 
of the continuous random quantity x. In the cases we consider this 
convergence will mean that all M, are contained in M, and for any ar- 
bitrarily small positive number ¢, any arbitrarily large number N, and 
for any point x from M in each of the sets M, with 1 > N there is at 
least one point removed by less than e« from the point x. Second, for 
any subdomain P of the domain M and for any arbitrarily small positive 
number 6 for all numbers i, beginning with some number, the following 


inequality must be satisfied 


J olede —- Srl es. (50) 


x ePAM 





The summation in the left side of this formula is taken over all 
the points from M, contained in the subdomain P. 

The concepts of the mean value (mathematical expectation) of thie 
random quantity and its second order central moments are of great i - 
portance for the further constructions. 

Let X= (x), Kos sees x) be an n-dimensional continuous random 
quantity with the probability density function (x, > Xor sess xy) As 
noted above, without losing generality the function » can be consid- 
ered determinate over the entire infinite space. Then the mean value 
of the random quantity x 1s defined as the vector m = (m,, Ms ees m) 


computed from the equation 


m= (mtn «stm —= ff She | Foci te ( 1) 


eee Xn) dx, dx:. ee dXn. 


The second order central moments Ask are determined by the equa- 


tions 
hin = { fee we — mi) (te — mM) @ (Xu Xa +> 
iat oe (52) 
(i,k == 1,2,...,n). 
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For the univariate random quantity x there is the natural second 
order central moment determined by the equation 


d= |e —my'e(e)de. (53) 


This moment is uaually termed the variance of the corresponding 
distribution. 

It is natural to transfer all these concepts and their definitions 
to the discrete random quantities as well. 

To do this we need only in equations (51)-(53) replace the inte- 
gration by summation extended over the entire domain of definition M 
of the corresponding discrete (vector) quantity x, and in place of the 


probability density function 9(x,, Xx ay x) write the probability 


o? ee 
distribution function of this quantity f(x). As a result, equation (51), 


for example, is rewritten 


m = & zf(z). (54) 


All the remaining equations are changed similarly. 

For the multivariate random quantities it is convenient to com- 
bine the second order central moments i,, into the matrix IPs yell and 
construct from them (in the case when they are finite) Q (t,, to» 
ees t,) =4,¢,¢,- With the aid of orthogonal conversion (rotation) 
of the coordinate system this form can always be reduced to the form 


m 
2a, (Ce where ti are new coordinates and a, are positive coeffi- 

i=l 

cients (1 = 1, 2, ..., m). Forms of this type are termed positive semi- 


definite. If m =n, i.e., if the number of squares after reduction of 
the form Q is exactly equal to the dimension of the space, then the 


form Q is termed positive definite. In this case its determinant 
|Q| = |A,,| is of necessity nonzero and (strictly) positive. 


For the positive definite formQ=Zz t,t, we can define the 
1.4 4k 1k 


5] 
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inverse form gu = alts: tos sees t.) me ; Ma,tyt,» whose coeffi- 
3 


cients are given by the equations lop = Q5,/1QI; where Q,, is the alge- 
braic complement of the element Ask of the determinant |Q| = Por 

(1, k=1, 2, ..., n). This form will again be positive definite. The 
matrix obtained ||,,|| will obviously be the inverse of the matrix 
Aq, ll, since the latter is symmetrical (of course, the matrix ||u,,]|). 


The following fundamental result [48] is of great importance in 


a ewe. 


probability theory. 
Theorem 1. If the n-variate random (continuous or discrete) quan- 
tities X19 Xoo sees X, are independent and have the same distribution 
with finite second order central moments Ask for which the fom Q(t,, 
tor sees t,) == a As,tyt, 18 positive definite, and with mean valuc 
equal to zero, then as k™o the quantity er = ale + 2%, +...-+ %) hac 4 
limit continuous distribution also with zero mean value and with th« 


(univariate) probability density function 


1o~l 
oj eee eveeve ny 


2x) *V TQ] 

In the case when the form Q is degenerate, we turn to the consid- 
eration of some subspace L of the original space, replacing the random 
quantities X19 Xoo eeey Xy by their projections on the subspace. The 
subspace L is chosen so that the new form Q of the central moments | ob= 
tained as the result of the indicated projection is already positive 
definite, while the projection onto any subspace perpendicular to it 
would lead to a degenerate form (such a space always exists). The ap- 
plication of theorem 1 in the constructed space L gives in this case a 
distribution in the original space as well, since all possible valucs 


of the random quantities x), Xo, ++» X, (and this means their sums as 


well) lie in this subspace with the probability 1. 


io ae 


In several cases we limit ourselves to the selection of the sub- 
space K of maximal possible dimension of the number of all subspaces 
with the nondegenerate form Q. Let us show that the subspace L can be 
obtained from the subspace K as the result of a nondegenerate linear 
transformation. 

If in the conditions of theorem 1 the mean value of the quantities 
X19 Xos coves X is nonzero, then, denoting this mean value by m = (m,; 
Mos sees m,)» we find easily that the mean value of the random quan- 
tity x = 1k (x1 + Xp 
ee m, vk) and that with sufficiently large k a good approximation for 


+ vee +X) will be the quantity (m vk mk, ee 


the distribution law of the random quantity x will be the continuous 


law with a univariate probability density function of the form 


| = lo-lin=m Viti—m VE... tn—m, VR), 
Fagen (55) 
mr V¥]Q 
Let us now apply theorem 1 and equat’.on (55) to the so-called bi- 


9 (t1, ts, ee etn) = 


nomial distribution. The binomial distribution arises as the result : 
of the conduct of independent trials using the so-called Bernoulli 
scheme. This scheme, in addition to the property of independence of 
the trials, is also characterized by the fact that with each trial 
only two outcomes are possible, occurring with the probabilities p 
and q = 1-— p respectively. Let us term the first outcome success and 
ther second failure of the trial and let us introduce the random quan- 
tity X45» taking the value 1 in case of success of the i-th trial and 
the O in case of its failure (1 = 1, 2, ..., n). 
The random quantity k = xy + Xo + cae oF x, is clearly equal to 


the total number of successes with n independent trials. Let us denote 


k 
by c 


c’ = 1), then it is easy to find that the distribution (discrete) law 
of the random quantity k is given by the function 
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the number of combinations of n with respect to k (by definition 


Te ERE ae 








fn(k) = Ctptgr—* (k = 0,1,...,2). (56) 


This law is termed the binomial distribution law since the quan- 
tity oe is obviously the (n — k + 1)th term of the expansion of 
the expression (p + q)" using the Newton binomial formula. 

The random quantities X49 Xos sees Xp have the same distribution 
law with the mean value m = lep + O-(1 — p) = p and the variance (sec- 
ond central moment) d = (1 — p)“ep + (O- p(t — p) = p(l—p). From 
theorem 1 and equation (55) it follows that for sufficiently large n 
a good approximation for the distribution law of the quantity x = 
= k//n = 1Wn (x, +x, +... + x,) will be the distribution law with the 


2 
probability density function of the form 





(8) = pene” aoe (57) 

The distribution law with the probability density function of the 
form re (with a > 0) we shall term the (generalized) univar- 
late normal distribution law. It is not difficult to see that the value 
of the random quantity distributed according to this law is equal to m 
and that its variance is equal to a. 

It is easy to see that with multiplication of the normally dis- 
tributed random quantity x by the constant factor cs the new quantity 
y = cx will also be a normally distributed random quantity and its 
mean value will be c times larger and the variance eo times larger in 
comparison respectively with tne mean value and the variance of the 
original quantity x. 

Comparing the results obtained with equation (57), we come to the 
following proposition. 

Theorem 2. With a sufficiently large number n of Bernoulli trials 
with probability of success p the distribution law for the total nune 


ber of successes k can be approximately expressed by the normal law 
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(k—pn)* 


I 
fe) ea rg = 
with the probability density function of the form 9) Vi eri o7nPl—p), 





The mean value of the random quantity corresponding to this law is 
equal to pn and its variance ‘is equal to np(1 — p), which agree with 
the exact values of the mean value and the variance of the original 
discrete random quantity k. 

As the result of the fact that in the derivation of the statement 
contained in theorem 2 the quantity x was multiplied by the factor 
Jn, the continuous distribution p for the quantity k obtained in theo- 
rem 2 does not possess, generally speaking, the property of the limit 
distribution for the original (discrete) distribution f of the quan- 
tity k with unbounded increase of the number of trials n. 

However, it is not difficult to note that with sufficiently large 
values of n the probabilities calculated in accordance with the dis- 
tributions » and f of finding the quantity k in any intergral whose 
length is cf the order of: the quantity Vn (1.e., has the form Wn, 
where c 1s & constant) wi2l differ from one another by arbitrarily 
small amounts. 

In practice we usually need to calculate the probability of find- ° 
ing the quantity k on intervals of the form [pn, pn + zo], where the 
quantity o = /np (I— p), equal to the square root of the variance of 
the distribution 9 (and this means of the distribution f as well), is 
termed the mean square variation (or the mean square error) of the 
distributions 9 and f. The following theorem is valid. 

Theorem 3. For any positive number z the probability p(z) that 
the total number of successes in n Bernoulli trials with a probability 
of success p will be found in the interval [pn, pn + 2/np (I — p)J, is 
expressed by the equation p(z) = 0(z) = 1/Wen x fe F ae . With any z, : 
by choosing n sufficiently large, we can make the error in the calcu- 
lation of p(z) using this equation arbitrarily small. 
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We show the numerical values of the function $(z) for some values 
of x with four decimal places: © (1) = 0.3413; ® (2) = 0.4772; > (3) = 
= 0.4986; for z > 4 the values of o(z) differ from 0.5000 by less than 
half a unit of the fourth decimal place. 

We term the approximate equation in the condition of theorem 3 
the de Moivre-Laplace formula. As indicated in theorem 3, the accuracy 
of this formula increases with ine increase of the number n of trials 
performed. 

Let us consider a series of independent trials, each of which has 
m different outcomes, and let p, (p, > 0) denote the probability of the 
i-th outcome of the trial (1 = 1, 2, ... m). We denote the total number 
of trials conducted by the letter n and the number of those which ter- 
minated with the i-th outcome — k, (1 mw 2G Oecia golly Lb ie. easy to 
see of the quantities Ky> considered by itself, is distributed in ac- 
cordance with the binomial law. We pose the proble: of finding the 
joint (multivariate) distribution law of several quantities k,, for ex- 
ample the quantities k,, k,, ..., k, (l<¢rg¢m). It is not difficult 
to verify that the solution of this problem is given by the equation 


f (Ri, Ro, 2. Rr) = 
n! : 
ein eh ee (58) 
x phot pi (1 — py = Pao = python te 


For this distribution law, which is custamarily termed the poly- 
nomial distribution law, we can obtain a continuous approximation just 
as we did above for its particular (univariate) case. To do this let 
us consider the multivariate random quantities xJ = beh 55 Eis x)» 
such that the quantity x! takes one of the values (TO 5 iss 0) SCOROU ae 
.». O), ..-, (00 ... O01) or (000 ... 00) in accordance with the out- 


come of the j-th trial of the series we consider — first, second, ..., 
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rth or any different from the (j = 1, 2, ..., n). All the random 
quantities xd have the same distribution law, their mean values are 
equal, obviously, to (py> Pos eres Py). The second central moment 2,, 
is equal, obviously, to the quantity (1 — p,)* p, + (o7= p,)°(a _ P,) = 
= p, (1 - P,) (4 = 1, 2, ..., r). The second central moment hay With 
i # k also is easily calculated: Auk = (1- P,) (O - Py) Py + (O- P,) 
(1— Py) P+ (O- py) (0 Py) (1 — Py — Py) =~ PyPy 

The determinant |\,,| of the matrix ||\,,|| of the central moments 
will be equal in this case, as is easily shown, to the product P1Po 
-++ PL(1—- Pp} — Po — +++ — PL). Thus, the matrix |[\,,|| will be degen- 
erate only in the case when r =m. In all the remaining cases the qua- 
dratic form Qty, ty sh) = Fda tite = Pi (1 — pi) 5 pipatite will be positive 
definite, since its determinant |Q] = Irs il is positive. 

Applying theorem 1 to the random quantity y = 1An(y, + Yo + wes 
+ Yn)» where y, = (xy — Py? x3 - Pos eees xt - Pr)» we come to the con- 
clusion that with r<m it has a limit (as n-o) distribution law with 
a probability density function of the form 


cane l On RT tte ot. 

(2x) 7 V [Qi 
The multivariate random quantity z = (k,/n — Py> k,/n — Po vers k,,/n _ 
_ Py) is connected with the quantity y by the relation z = 1Wvn =y 
and will therefore have the same distribution law, but with a variance 


n-fold less than the variance of the quantity y. Consequently, the 
probability density function of the distribution law for z is written 


It is now not difficult to establish the following result. 
Theorem 4, Let there be given the series of independent trials 
with m different outcomes, having resprctively the probabilities 
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Py» Pos sees Ph (same for all trials). If in the series of n trials 
we use Ky» Kos ines kn to denote the number of trials terminating re- 
spectively by the lst, end, ..., m-th outcome, then for any a priori 
given positive number ¢ for the probability p of the simultaneous sat- 
isfaction of all the inequalities |k,/n - P, | ae aah 0 Bees eae ite 8) 
there exists the estimate | e> em (where @ and b are 
positive constants not dependent are 

For the proof of this theorem we note, first, that all the in- 
qualities A |<e(i=1,2,....m) are obviously satisfied if there are 


satisfied the m— 1 inequalities 


ReplieeS Gia lot nr Sah 
peace zal (59) 


Actually 


Rm 
oa 


ft Ri Re en, eRe oo pk 
nr (l—pi—P; 
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ae pm) |< (m—1) =e. 








It follows from the considerations preceding the formulation of 


theorem 4 that the quantities z, = k,@m — p, (i weer gel a) 


i 
have a limit (as no) distribution law with the probability density 
function of the form glz,, 25, .... 2na1) =An Fem ttm » Where a is a pos 
itive constant and P is a positive definite quadratic form with coeffi- 
cients not depending on n For sufficiently large n the probability p 
that at least one of the inequalities (59) is not satisfied has, ob- 


viously, an upper estimate of the form 


B<({.. .[e@uz.-. 2m) d2,d2o. .  d2m=1s (60) 


where the region Ry is the outer portion of the hypercube bounded by 


the hyperplanes z, = 64 (6, < eM 1, i= 4d, es «205 — 1), 


i 
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After rotation of the coordinate axes for the purpose of reducing 
the form P to the sum of squares with positive coefficients bi> bos 
sees Diy we can in the hypercube turned relative to the new axes in- 
scribe the new hypercube R, bounded by the hyperplanes Zi = 6 (6 < 64> 
14=1, 2, ..., m— 1). Integration of the transformed probability u... 
sity function over the region external to the new hypercube gives again 
an estimate of the form (60), which can be strengthened by replacing 


all the coefficilentd by» Do» ewe 0 by the smallest coefficient 


m-1 
among them, designated by g. 


As a result we obtain the new estimate 


B<an 7 af emer gyn (61) 


Since 


6 


it is easy to obtain the final estimate 


< a end? 
: = (62) 


Denoting a/(gs) m4 by the letter a and gem by the letter b, we ob- 
tain the required estimate, for the present, it is true, for all n be- 
ginning with some possibly quite large value. We can, however, also 
take account in the derived estimate of the remaining finite set M of 
values of n by increasing, in case of necessity, the constant a. Since 
the probability p is clearly greater than zero, it is sufficient to 
select the quantity a larger enough so that the right side of the esti- 
mate under discussion becomes negative for all values of n belonging 
to the set M. 

Thereby theeorem 4 is fully proved. 


In concluding the present section we shall describe still another 
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frequently encountered distribution — the so-called Poisson distribu- 
tion. This distribution can be treated as an approximation for the dis- 
tribution with the condition than the number of trials n is large, the 
probability of success p in each trial is small, and the product A = np 
is not small, but also is not large. In this case the probability a 
that exactly k trials lead to success is expressed by the approximate 


equation 
A 
ame*e. (63) 


In particular, for k =0 ase’, 

The Poisson distribution has a maximum of the probability with a 
maximal value of k satisfying the inequality k < X. In the theory of 
discrete self-organizing systems we encounter the Poisson distribution 
in the organization of teaching automata words or sequences of words 
of differing length. With a random selection of the words being used 
in the teaching, the Poisson distribution frequently gives a sufficient- 
ly good approximation for the distribution law of the word lengths. 

We note that the mean value of a quantity having a Poisson dis- 
tribution (63) is equal to i. 


§3. A QUANTITATIVE MEASURE OF SELF-ORGANIZATION AND SELF-IMPROVEMENT 
IN AUTOMATA 


In the first section we encountered the concept of self-<adapta- 
tion in automata: it is natural to term automaton self-adaptive if it 
changes in the course of time its responses to the questions fed to it 
(for some cycling of the input and output information). However, not 
every self-adaptation should be identified with self-organization. On 
the basis of the intuitive idea of self-organization, we should term 
self-organizing that automaton which improves the organization of its 
possible learning histories. For the quantitative characteristic of 
this improvement it is natural to make use of the probabilistic-—theo- 
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retic concept known as entropy. 

We shall use the entropy concept only for the discrete random 
quantities. Let there be given the discrete random quantity x with the 
domain of definition R and with the distribution law f(x). In this case 
the entropy of this quantity, or, what 1s the same, the entropy of thc 
distribution f(x), 1s the term given to the negative sum, taken over 
the region of definition R of the given random function, of the pro- 
ducts of the probabilities f(x) and their logarithms 


H = — Zee) logs (2). (64) 


In the use of this equation it is assumed that for f(x) = 0 the 
product f(x) log f(x) is zero. Any positive number, strictly greater 
than unity, can be selected as the base of the system of logarithms. It 
is easy to see that with a change of the base of the logarithm system 
the values of the entropies for all the distribution laws are multi- 
plied by the same constant factor. In practice, use is commonly made 
either of the binary (with base two), natural, or decimal logarithms. 

As is shown in information theory (see, for example, Goldman [32]), 
entropy is the natural measure of the indefiniceness of the values of 
the random quantity: the greater this indefiniteness, the larger the 
value of the entropy. In particular, if the random quantity can take 
only two values with the probabilities p and q = 1-— p respectively, 
the maximal value of the entropy is achieved with equality of these 
probabilities: p = q = 1/2, which corresponds to the intuitive concept 
on the maximal possible indefiniteness in this case. If, however, one 
of the probabilities p or gq vanishes, then, as is easily seen, the val- 
ue of the entropy also vanishes, which again is in good agreement with 
common sense, since in this case there is actually no indefiniteness. 

With combination of the two independent random quantities x and 
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y into one multivariate (with dimension equal to the sum of the dimen- 
sions of the quantities x and y) random quantity z = (x, y), the en- 
tropy of the distribution of the quantity z is equal to the sum of the 
entropies of the distributions of the quaritities x and y. 

Actually, if the distribution laws of the quantities x and y are 
given by the functions f, (x) and f,(x), then the distribution law of 
the quantity z is obviously given by the product of these functions 
f, (x)fo(y). The entropy of the quantity z (H, ) is then calculated from 
the equation 


Hp =— 2 Vp) h ly) log th () fey) = 


xeP, yePs 


=—Dha(yyD fix) logh (x) — D fi (xed D fey) log fa (y) = Hs + My, 


yeP, xeP, xeP, yePs 


where P, and P, are the regions of definition of the quantities x and 
y, and H.. and Hy are their entropies. 

The property of the entropies of the independent distributions 
which leads to the formation of their sum when these distributions are 
combined into one is termed the entropy additivity property. 

For the automata operating using the simple question=-response 
cycle (without evaluation of the quality of the response), we can ap- 
proach the definition of the extent of the self-organization with the 
aid of the consideration of two entropy characteristics — the learning 
entropy and the examination entropy of the automaton. In the following 
discussion we shall follow basically the work [25]. 

Let (p,, Por sees P,) = P be the sequence of questions (input 
words) supplied to the automaton in its learning period. We will term 
this sequence the learning sequence. Let us assume that in a particular 
fixed series of experiments with the automaton, for each learning se- 
quence P there is given the probability p(P) of the appearance of this 
sequence in the experiments of the series under consideration (it is 
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assumed that within the limits of the given series this probability 
does not vary from experiment to experiment). This specifies some dis- 
tribution R of the probabilities p(P) of the learning sequences. The 
entropy of this distribution, which we shall term the learning entropy 
with the given learning distribution law, is calculated from the famii- 


lar equation 
H" (earn) =—Ze(P)loge(P). (65) 


For definiteness we agree to use natural logarithms for the com 
putation of the entropies. 

In the case when the automaton A and its initial state ay are 
fixed, every distribution of the probabilities p(P) of the learning 
sequences uniquely determines some distribution of the probabilities 
a(a) on the set of all states of this automaton. Here a(a) denotes the 
probability that after termination of the automaton learning process it 
will be in the state a. If we use S, to denote an event at the input 
of the automaton, representable by the state a (set of input words 
transferring the automaton from the initial state into the state a), 


then we obtain 
a(a)= Lo(P) (66) 
PeS, 


where the summation extends to all words of the form P = Py Po +++ Pps 
contained in 5, (for brevity of writing, the sequence of words P = 
=(p,, Pos ++ P,) is identified here with the word p, py... P,, com- 
posed from the elements of this sequence). 

Now let us fix some probability distribution y(p) of the questions 
p applied to the automaton after termination of its learning process. 
The distributions a(a) and y(p) together with the switching and output 
functions of the considered automaton A uniquely define the probability 
distribution B(p, q) for the pair "question (p) — answer (q)". We term 
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the entropy of the latter expression the examination entropy and denote 
it by H° (exam). We use Q to denote the so-called law of experimentation 
with the automaton, which is the combination of the two distribution 
laws — the distribution of the learning sequences and the distribution 
of the examination questions. 


The quantity H® (exam) is determined from the equation 


H® (exam) = —¥ 6(p. 9) log Bip. ¥). (67) 

Using if necessary the operation of cyclic reduction of the autom- 
ata, we can, without losing generality, consider only sequences of 
single-letter questions and responses. Here the learning sequences P 
are converted into words consisting of the individual components of 
their question-letters arranged in the order in which they were applied 
to the automaton in the learning process. 

For the further construction of the theory it is necessary to spec- 
ify some class of laws of experimentation with the automaton and as- 
sign to each law Q occurring in this class some probability, or, in 


the case of the continuous distribution laws, some probability density 


?(Q). 
The simplest case is the scheme of independent trials, when at 


each step, both in the learning regime and in the examination regime, 
the probability y(P) of the appearance of any given question is con- 
stant and depends only on this question. In view of the limitation to 
only l-ecycled automata, the specification of the law Q for experimenta- 
tion with the automaton is equivalent in this case to the assignment 

of certain probabilities v, = v(x, ) of the appearance at the input of 
the automaton of each of the letters Xs of its input alphabet. The sum 
of all the Vas of course, must be equal to unity in this case. 


In the case of the scheme of independent trials the law of experi- 
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mentation with the automaton is naturally identified with the vector 
v= (vy, Vo» .-.), consisting of the probabilities of the appearance 
of the different input letters (it is assumed that the input alphabet 
is ordered in some fashion). The class of laws is naturally identified 
with the set of all vectors v = (v5; Vos ...), Satisfying the naturai 
limitations 0 < v, < 1 and Sv, = 1 (1 = 1, 2, ...), with a uniform dis- 
tribution law given on this set. We agree to call the scheme of inde- 
pendent trials with this selection of class of distribution law the 
uniform scheme of independent trials. Here we limit ourselves to the 
case when the length of the learning sequence is fixed, or in case of 
necessity we shall assume that these lengths are described by some dis- 
tribution law (most frequently Poissonian). 

If there is given some law of experimentation Q, then it, as noted 
above, includes in itself two distribution laws — the law of distribu- 
tion of the learning sequences and the law of distribution of the ex- 
amination questions. The corresponding random quantities are to be con- . 
sidered independent in the case of the usual organization of the ex- 
periments on the self-improving automata. Therefore the entropy of the 
joint distribution of these two quantities, which we shall agree to 
term the entropy of the corresponding law of experimentation Q and de- 
signate by He, will be equal to the sum of two entropies — the learn- 
ing entropy H®(1earn) and the entropy of the examination questions 
H® (quest). The latter entropy must not be confused with the examina- 
tion entropy H° (exam) which relates not to the distribution of the ex- 
amination questions, but to the distribution of the question-response 
pairs. The examination entropy depends not only on the distribution of 
the questions and the distribution of the learning sequences, but also 
on the automaton itself, while the entropy of the law of experimenta- 
tion with the automaton does not depend on the automaton. 
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Let us assume that in the class K of laws of experimentation with 
the automaton there is fixed some law Qo which has the maximal possible 
entropy H%0, Introducing increments of the entropies of experiment and 
examination by the equations 

aH® = He ~ H0(exam) = He (exam) — H0( exam). 
we obtain the possibility for any automaton A and class K of laws of 
experimentation with the automaton (with the probability density 9(Q)) 


to introduce the two averaged characteristics 


Ste Ai (exam) 9(Q)d@ (69) 
2(A, K) = A Meat de (70) 


The integrals in these equations are taken over the region con- 
sisting of all the laws of the considered class K. The larger the val- 
ue of these integrals, the greater the average capacity of the consid- 
ered automaton A for self-organization. The zero value corresponds to 
the absence of capability for self—organization, and negative viiues 
indicate that with improvement of the organization of the learning, 


the organization of the responses of the automaton on the average de— 


t 


teriorates. In other words, the automaton behaves as a self=-disorgan~ 


izing" system rather than as a "self-organizing" system. 

Since equation (70) leads to considerably more complex computa- 
tions than equation (69), we shall select as the basic quantitative 
criterion for the evaluation of the capability of an automaton for 
self-or;;anization the criterions (A, K) rather than the criteria 
z (A, K). 

Let us consider as an example the two automata A and B whose 
switching and output functions are given by the tables: 
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for automaton A “a Ge 


11 2 1 2 
for automaton B xii 2; jee 
yk 2 yuo 


In these tables the numbers 1 and 2 denote the states of the au- 
tomata, the letters x, y denote the input signals (questions), and 
letters u, v denote the output signals (responses ). 

As the class K of distribution laws we select that class in which 
the probabilities of the occurrence of the examination questions x and 
y are equal,. and the distribution laws of the learning sequences result 
from the scheme of independent trials with the probabilities of the 
independent trials with the probabilities of the occurrence of the 
questions x and y equal to p and 1 — p respectively (p runs through all 
the values from 0 to 1 in the limits of the class K with equal proba- 
bilities). In addition, we fix the length n of the learning sequences, 
and we denote the criterion s(A, K) corresponding to the selected val- 
ue of n by s (As K). 

The automaton A will be in the state 1 if the last question given 
to it during learning was x, and in the state 2 if the last question 
given it was y. This implies that the probabilities of the question- 
response pairs will be equal: for the pair (x, u) — 1/2 p, for the 
pair (y,u}-1/2 p, for the pair (x, v) — 1/2 (1 — p) and for the pair 
(y, v) — also 1/2 (1— p). Consequently, the examination entropy 

He (exam) =— $plnyp— 5 Pn p —+ (1 — p)in-> (I —p) — ; 
1 


— all —p)in+ (1—p)= —pinp —(1—p)In(1 —p)—In-+. 
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The maximal entropy of the experimentation will obviously be with 
p=l1-pe#1/2. In this case the examination entropy is determined by 


the expression 
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H% (exam) =—zyl5-7% % 5 5 


The increment of the examination entropy 
AH? (exam) = — plnp—(l— p)in(1— p) + In + 


The probability density of the laws Q in the selected class is 
clearly equal to unity. Application of equation (69) gives 
! 
5,(4.K) = [(pinp = (1 —p)!n(1 — p) —In 3) ws 'n2— 
: — 70.19. 

The automaton B will be in the state 1 cnly when the learning se- 
quence consists of only x's. The probability of this is obviously p 
Hence the probabilities of the examination pairs (x, u) and (y, uw) are 
equal to 1/2 p”, and tne probabilities of the pairs (x, v) and (y, v) 
are equal to 1/2 (1— p”). Just as in the case of finding s(A, K), we 


find AH® (exam), as a result of which we obtain the sequence of equa- 


tions 


Sa (B, K) = {form so £ = p'ying (1 te in os 


1 l ] nin2 n ! 
~(1-m)ea(t~ ae) [om 2a ace (1) * 


I 
; ! _ pan, _tind+to oR 
x In(t Sead I fa — p")|In(1 — p") dp “on” (n $1)? 
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From this estimate we easily learn that with n > 5 the yantity 
By (B, K) is negative. In other words, with learning using scqu:s..ces 
of length greater than 4, in the selected class of laws of experin nta- 
tion the automaton B is on the average "self-disorganizing", while the 
automaton A under the same conditions discloses capability for self- : 
organization. 

We note tnat the conclusion on the capability or the incapabllity 
of the automton for self-organization depends on the selection of the 
class of laws of experimentation with this automaton. If, for example, 
in the example considered we select as K the class of laws of experi- 
mentation which results from the uniform scheme of independent trials, 
then as is easily verified, the automaton B would also become self-or- 
ganizing on the average, although the magnitude of this self-organiza- 
tion would remain less than that of the automaton A. 

With transition from the concept of self-organization to the con- 
cept of self-learning we can no longer be satisfied with the purely 
probabilistic-theoretic concepts. It is necessary to introduce the con- 
cepts which characterize the particular directionality of the self-or— 
ganization process. To do this it is most natural to introduce the real 
function f(p, q) defined on the set of all possible question (q) — re= 
sponse (q) pairs, whose value characterizes the quality of any response 
to any given question p. 

As we noted above, for any given automaton A with fixed initial 
state ag the specification of the law Q of the probability distribu- 
tion p(P) on the learning sequences P uniquely determines the proba- 
bility distribution a(a) on the set of the automaton states. Let. us 
further denote by q = A(a, p) the response of the automaton A, which 
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has been first reduced to the state a, to the question p. The quantity 
fe = Z(p. A(a, p’) y(p) a(a) is the averaged criterion of the quality 
of the Bee oe oe of the automaton to the "examination" when it has been 
taught by the sequences distributed according to the law Q (vy (p) is 
the probability of the appearance of the question p in the examination). 

It is natural to use the term self-learning content of the autom- 
aton A to denote the difference c2 _ 20 where Q% is the a priori prob- 
ability distribution law of the learning sequences, known to the de- 
signer at the time of construction of the automaton, and Q is the a 
posteriori distribution law which actually exists for some class of 
learning experiments. As a rule, the entropy of the distribution Qo is 
greater than the entropy of the distribution Q. 

If now there is given the class K of a posteriori distribution 
laws Q, with the probability density 9(Q), then the integral b (A, K) = 
ele (f8— f%) @(Q) dQ is the averaged quantitiative characteristic for the 
capability of the considered automaton for self-learning (for the se- 
lected class K, the automaton A and the real function f). 

§4. AUTOMATA WITH RANDOM TRANSITIONS 

In addition to the determinate automata, in the theory of self- 
organizing systems we must consider automata which have random tran- 
sitions. As is known (see Chapter 3), in the determinate automaton the 
specification of the preceding state a(t — 1) and the current input 
signal x(t) uniquely determines the next following state a(t) into 
which the automaton transfers under the influence of this input signal 
from the state a(t — 1). In the automaton with random transitions the 
specification of the pair a(t — 1), x(t) determines only the proba- 
bility Py s(x) of the transition of the automaton from the state a(t — 1), 
which we denote by ass into any other state a, under the influence of 


the input signal x(t) = x. 
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It is easy to see that the determinate automaton can be consid= 
ered as a particular case of the automaton with random transitions in 
which for each x with any given 1 only precisely one of the probabil- 
ities Py 3 (x) is equal to unity, and all the remaining probabilities 
are equal to zero. 

It is natural to specify every automaton with random transitions 
with the ald of the system of matrices |[p, , (x) |], .where x runs sequen- 
tially through all the input signals of the automaton. Of course, in 
addition to such matrices there must also be given the output functions 
and the initial state of the automaton. 

The matrices IIPs 5 (*) Il have the property that the sum of the ele- 
ments of any of their rows is equal to unity. We shall assume also that 
there are no states in the automaton for which the probabilities of 
the transition from all the other states are equal to zero, This means, 
obviously, that the matrices IPs (I do not have columns composed only 
of zeros. In addition, all the elements of each of the matrices 
IP 5 (%) I are nonnegative real numbers which do not exceed unity. 

Matrices satisfying the three listed properties are customarily 
termed stochastic matrices. Thus, in the case <f automata with random 
transitions the role of the switching function is played by the func- 
tion IIPs, 5 (x) |l, which uniquely maps the set of all input signals of the 
automaton into the set of stochastic matrices. 

Of particular interset are the automata with random transitions 
which have one single (constant) input signal. Such autonata are stud- 
ied in classical probability theory under the name of uniform (dis- 
crete) Markov chains. The output signals in such automata are ignored 
(or identified with the states), which permits specifying these autom- 
ata with the aid of a single stochastic matrix. For definiteness, it is 
customarily considered that the first row (and the first column as 
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well) of this matrix corresponds to the initial state of the automaton 
(Markov chain). 

The Markov chains also have another (non-automaton) interpreta- 
tion — in the terms customarily used in probability theory. This inter- 
pretation is based on the concept of trials, considered in the preced- 
ing section. However, here we must consider not independent trials, 
but the trials in which the probabilities of particular outcomes of 
each successive trial depend on the outcome of the directly preceding 
trial and do not depend directly on the outcomes of all the remaining 
trials (the set itself of possible outcomes does not change from trial 
to trial). The there arises the matrix IPs 5 Il of the so-called transi- 
tion possibilities. Any element Py of this matrix is the probability 
of the jth outcome in each successive trial under the condition that 
the outcome of the trial directly preceding it was 1. 

It is easy to see that such treatment is completely equivalent to 
the automaton treatment: the trial outcome is, essentially, simply an- 
other name for the state of the automaton (having a single input sig- 
nal) with random transitions. There is, it is true, one difference: in 
the automaton with random transitions there was fixed a completely de- 
termined initial state, in the Markov chains it is customary to spec- 
ify the probabilities of the various outcomes of the initial trial 
Py» Pos +++ Py» which corresponds to the random selection of the ini- 
tial state of the automaton, so that the ith state can be selected as 
the initial state with the probability p, eta ae oo re gs on 

Thus, for a more complete analogy with the Markov chains it is 
necessary to consider not the simple automata with random transitions 
(having a single input signal) but the so-called random automata in 
which not only the transition function but also the selection of the 
initial state is random, and if, in addition, the output signals are 
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taken into consideration, then the output function must, generally 
speaking, be random. In other words, the output function must specify 
not simply the output signal, but some probability distribution on the 
set of all possible output signals. 

The Markov chains (or, correspondingly, the random automata) are 
termed finite or infinite depending on whether the possible set of out-— 
comes (or, correspondingly, the set of states of the automaton) is fi- 
nite. We shall require for the random finite automata of general form 
also finiteness of the set of their input and output signals. We shall 
limit ourselves to the study of only the uniform Markov chains, l.e., 
those chains in which the matrix of the probability transition proba= 
bilities is constant. We will not encounter nonuniform Markov chains 
(with matrix of the transition probabilities which depends on time) in 
the future. Therefore for brevity we shall speak only of Markov chains, 
meaning each time, if not otherwise stipulated, that we mean uniform 
chains. , 

Let us consider the automaton A with random transitions (Markov 
chain) whose transition probability matrix is P = || Py yl. As mentioned . 
above, the arbitrary element Pay of this matrix is the probability of 
the transition of the automaton A from the ith state into the jth. It 
is important to emphasize that here we are speaking of the transition 
in one cycle (i.e., the interval between two neighboring moments of 
discrete automaton time). It is easy to see that the product PaPrj 
is the probability of the transition of the automaton A from the ith 
state into the jth in two cycles with the condition that the automaton 
passes through the kth state. 

The sum =P ak Pay? extended over all the states, obviously gives 
the total probability of the transition of the automaton A from the 
ith state into the jth in two cycles. Moreover this sum is clearly the 
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element of the matrix PeP = P@ standing at the intersection of the ith 
row and the jth column. We obtain similarly: the probability of the 
transition of the automaton A from the ith state into the jth after 
three cycles of operation is equal to the (1, j)th element of the ma- 
trix P-.P = ps, Continuing similarly, we come to the following proposi- 
tion. 

Theorem 1. For there random automaton A with a single input sig- 
nal (uniform Markov chain) whose transition probability matrix is P, 
the probability of transition from the ith state into the jth after 
exactly n cycles is equal to the element of the matrix pn standing at 
the intersection of the ith row and the jth column (n = 1, 2, 3, ...). 

It is natural to term the elements of the matrix P” the transi- 
tion probabilities after n steps. For the determination of these prob- 
abilities in the case of the finite Markov chains we make use usually 
of the so-called Perron equation which is derived in matrix theory. 
Let us first recall certain definitions and concepts of this theory. 

Let there be given the matrix P = IPs sll of nth order. The de- 


terminant 
A— Pu neues — Prn 
Ad] See veL Ditch merry ace cee 
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is a polynomial of nth degree in A, termed the characteristic polyno- 


mial of the matrix P. The roots of this polynomial are termed the 


eigenvalues of the matrix P. 

Let us denote by E the unit matrix of nth order. Then the element 
of the matrix AE— P)~/, located at the intersection of the ith row 
and the jth column will be equal to. 1/P(X) «P44 (a) where Pig) is the 
algebraic complement of the element of the determinant P() located at 


the intersection of its jth row and ith column. 
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Now let the matrix P = lIP, jl have the eigenvalues Aj, Ao, ...5 Az, 


We denote by Mm, the multiplicity of the ith number Ags 1.e., in other 
words, the maximal number s such that the characteristic polynomial 
P(A) 18 divided by (A ~ r,)*,and we define the polynomial V4 (A) by the 


equation 


10) = GR 
Then the element pf of the matrix p(k) located at the intersection 


of the ith row and the jth column can be determined from the equation 
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Equation (71) is ene Perron equation. In it a vel Genotes the 


(71) 





derivative with respect to A of order myo. Substitution of the value A = 


= Ay must be performed after the differentiation. The derivation of the 
Perron equation can be found in any monograph on the theory of finite 


Markov chains (see, for example, Romanovskiy [68]). 


The Perron equation takes a particularly simple form in the case 
when all the elgenvalues of the matrix P have a multiplicity equal to 
unity, i.e., when m, = Mo mas =n, = 1. It is clear that in this case 
r=n. Since the factorial of zero is unity, and the derivative of zero 
order denotes the absence of any differentiation, then for this par- 
ticular case we obtain the simple equation 


MP iy) 5 cas ' . 
nt = ya) (j= 12.--s A) (72) 


We shall term equation (71) the general, and equation (72) the 
special Perron equation. Equations (71) and (72) permit the solution 
of one very important problem of the theory of finite Markov chains — 
the problem of finding the so-called limit distribution. If there ex- 





ists the limit limP* = Pe, then it is natural to term the elements of 
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the matrix P” = IIPa3 the limit transition probabilities. Having the 
initial distribution, i.e., the probabilities Pys Pos sees Ph of the 


various outcomes of the initial trial, we can obtain the probabilities 


Py of the various states in the limit distribution from the equations 


: px=Z PP (f= 1, 2,..., A). (73) 
j= 
’ It appears natural to assume that after a sufficiently large num- 


ber of transitions of the random automaton characterizing the Markov 
chain, the effect of the initial distribution of the state probabil- 
ities on the distribution of the states obtained as the result of these 
transitions can be made arbitrarily small. In other words, the limit 
distribution obtained using equation (73) must not devend on the ini- 
tial distribution (py, Pov sees P,): If the limit distribution has this 
property, then the corresponding Markov chain is termed ergodic. The 
ergodicity property will obviously hold if and only if for any given i 
all the elements Pit (j = ly 2, 132, n) abe Identical, 1.¢.,; in other 
words, when all the rows of the matrix of the limit transition prob- 
abilities are identical. 

It can be shown that the moduli of the eigenvalues of the stochas— 
tic matrices cannot exceed unity. It is also not difficult to see that 
the eigenvalue for any stochastic matrix is unity. If all the remain- 
ing (non-unity ) eigenvalues of the stochastic matrix M are strictly 
less than unity in modulus, then the matrix M and the corresponding 
Markov chain are termed proper. If in the proper stochastic matrix P 
unity is a simple root of the characteristic polynomial, then the ma- 
trix P and the corresponding Markov chain are termed regular. 

The following proposition is valid [14]. 

Theorem 2. The Markov chain C with a finite number of states has 
a limit distribution if and only if it is proper. In order that the 
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chain C satisfy the property of ergodicity it is necessary and suffi- 
cient that it be a regular chain. 

For the case of the regular finite Markov chain the limit transi- 
tion probabilities are determined by the equations 


oe Pu any: 4 
on ay +2... am) (74) 


These equations are obtained from the special Perron equation (72) 
as the result of the limit transition as k ~~», 

The results described above permit constructing the theory of the 
behavior of automata (random and deterministic) in random media. We 
shall limit ourselves to the consideration of only the Moore automata, 
Since in the case of the Mealy automata there arises the necessity for 
certain complications of the theory which make it less easily visual- 
ized. We also agree to consider the deterministic automata as a par- 
ticular case of the random automata, which, as mentioned above, is al- 
ways possible. 

With these assumptions every automaton A can be specified by the 
matrix L =([A, yl of the output probabilities and by the family of ma- 
trices p(n) = || o(m) of the transition probabilities. Any element Aaj 
of the first matrix is equal to the probability of the appearance of 
the jth output signal in the case when the automaton A is in the ith 
state. The quantity o{m) is the probability of the transition of the 
automaton from the ith state into the kth under the influence of the 
mth input signal. 

The medium is specified for some class of automata having iden- 
tical sets of input signals (x4, Xoo sees x.) and identical sets of 
output signals (vy, Vor sees v,)- Specification of the medium for the 
considered class K means the specification of the dependence of the in- 
put signal x(t) of any automaton A from the class K at the arbitrary 
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moment of discrete automaton time t on the value of v(t — 1) of its 
output signal at the moment of time directly preceding the considered 
moment of time t. It is assumed that this dependence is the same for 
all the automata from the given K. In other words, the behavior of the 
medium is determined only by the operations (output signals) of the au- 
tomata and does not depend directly on the internal arrangement of the 
automata. 

Let us consider the random media in which there are d.fir od the 
so-called reaction probabilities Pam which are combined into the ( rec- 
tangular) reaction probability matrix R = IPs ,ll. The value of r, is 
taken to be equal to the probability of the appearance of the mth in- 
put signal at the input of the autoraton A (from the class K) operat- 
ing in the considered medium if in the directly preceding moment of 
time there was delivered by the automaton A the jth output signal. 

If the medium reaction probabilities are constant, the correspond- 
ing medium is termed a stationary random medium. In the nonstationary 
random media the reaction probabilities can change with time. Just as 
in the case of the automata, the deterministic media (with a rigor— 
ously defined functional relationship x(t) = f(v(t — 1)) can be consid- 
ered as a particular case cf the random media. 

It is easy to see that the study of the behavior of the Moore au- 
tomata (both determinate and random) in stationary random media reduces 
to the study of uniform Markov chains whose states can be identified 
with the states of the considered automata. Actually, the state a(t) 
of the automaton A at any moment of time uniquely defines the probabil- 
ities of the output signals v(t) and, consequently, in view of the de- 
finition of the stationary random medium, also the probabilities of 
the input signals of the automaton in the directly following moment of 
time t + 1. The latter probabilities uniquely determine the probabil- 
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ities of the transitions of the automaton A from the state a(t) into 
any of the following states. 
With the notations assumed above, the transition probabilities 


of the corresponding Markov chain are determined by the equations 


Pur = YD haf bi? in’ (75) 


We shall describe, page Tsetlin [82], several very simple f 
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problems on the behavior of automata in random media. To do this we 
consider the class K of determinate Moore automata having the two in- 


put signals Xp = O and x, = 1 .nd the two output signals Yo = O and 


Vv, = 1. We specify the stationary random medium C by the matrix R of 
the reaction probabilities 

Rol mh Pe 

1—p, Py 








Let us term the input stenal Xy penalty and the input signal Xo - 
no-penalty. Then we can say that with the output of the signal Yo the 
medium penalizes the automaton with the probability Po» and with out- 
put of the signal Vy - with the probability Py- 

Let us consider first the Moore automaton A with the two states 
a, = 1 and ao = 2, given by the matrix of the output probabilities 
L == || and the matrices of the transition probabilities 

D0 om |[80)), DU" am [1011 « In other words, A is a determinate automaton which 
delivers in the first state the output signal Vo = 0, in the second 
state — the output signal v5 (= 1, retaining its state under the influ- 
ence of the input signal Xo = O, and changing to the opposite state 
under the influence of the input signal x, = Le 

In accordance with what we have said above, the functioning of 


the automaton A in the medium C is described by the uniform Markov 
chain M with the two states a, = 1 and a, = 2. From equation (75) we 
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easily find the matrix P of tne transition probabilities of this chain 





Pm! — Po — Pol 
| —~P 1—p,| 
The characteristic polynomial py) =| eae of the matrix 
—P—r —I+rPpy 


is equal to ne 2 (2 — Po - P,) + 1- p)—- P, and its elcenvalues are 
respectively Ay = 1 and Ao =j]- Po — Pi: If both probabilities the 
modulus of the second eigen value Ao is less than unity and, by theo- 
rem 2, the chain M will be ergodic in this case. 

The polynomial ¥, (a) will obviously be equal to A- 1+ py) + Py; 
and after application of equations (74) we easily find the limit tran- 


sition probabilities of the considered chain 


= ma) Py} : Pr = set 
Pu = Pa (1) Dot Py Pot Py 


and 
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Thus, with sufficiently long functioning in the medium C the au- 
tomaton A, regardless of the choice of the initial state will with 
the probability P1/Po + Py be in the first state, and with the proba- 
bility Po/Po + p, — in the second state. Since the penalty probability 
in the first state of the automaton is equal to Po» and in the second 
state is equal to Py> then the mathematical expectation S of penalty 
of the automaton A at each step (after sufficiently long preliminary 
functioning) is expressed by the equation 


— 2PoPa, 
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S=Do 
With po # p, the quantity S is strictly less than the mean pen- 
alty probability S) = 1/2 (Pp + pj). Actually, 


— § = Pot Pil* — 4PoPi _ (Po — pi) 
cam 1 PS Wat 2 a 


where equality is obviously achieved only when Po = Py: Thus, the con— 
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sidered automaton A possesses purposeful behavior in the sense that 
when it is placed in any stationary random medium which differentiates 
its two possible reactions it tends to that behavior for which the pen- 
alty value is on the average less than for an automaton delivering 
with equal probabilities both of the output signals (reactions) which 
are possible for the automaton A. 

Let us now select in place of the considered automaton A the au- 
tomatun A, with 2n states l, 2, ..., my n+l, ..., 2n— 1, Qn. We as- 
sume that in the states 1, 2, ..., n it delivers the output signal 
Yo = O, and in all the remaining states it delivers the output signal 
1. Assume, further, that the transition table of the automaton 


fe) 
A, is written as 


\ a 


e\ {E23 ..n—) on ntl n4+243. . .2n-1 Qn 


a) h 12... .n--2 n—I n+I ntl n+2. . .2n—2 Qn— I 
1 1234...” Qn nt+2 n+3nt4. . .2n n 





To this table there corresponds the transition graph shown in 


Fig. 12. 


Fig. 12 


From the form of its graph it is natural to term it an automaton 
with linear tactic. The automaton A analyzed above is obviously a par- 
ticular case of the automaton A, with linear tactic, for which the val- 
ue of n is equal to unity. The behavtor of the automaton A, with linear 
tactic in the general case is studied exactly as in the considered par- 
ticular case, although, of course, the corresponding operations are 


considerably more complex. These analyses lead to the conclusion that 
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the value, calculated by analogy with the preceding case of the mathe- 

matical expectation Sn of penalty of the automaton A, after one step 

of its operation (after a sufficiently long period of adaptation) with 

unlimited increase of the number n tends toward a natural minimal value 
Sain? equal to the smaller of the two numbers Po» Py: It is easy to see 


that the quantity S is the absolute minimum of the mathematical ex- 


min 
pectation of penalty for all automata operating in the considered ran- 
dom medium. 

It is found that in many cases the apparatus of the uniform Markov 
chains can be used with success for the study of the behavior of au- 
tomata not only in stationary, but also in certain nonstationary random 
media. Let us assume, for example, that there are several stationary 
ransom media Ci» Cos are CL similar to the medium C described above, 
but having different probability pairs Po» Py: From these media we can 
construct the nonstationary random medium N by introducing the matrix 
B= [bs 4 ll of the transition probabilities of some Markov chain with k 
states. At any given moment t of discrete time the medium N acts like 
one of the media Cy» Cos iy CL. If the model for its actions is the 
medium Cy» then we say that the medium N is in the ith state. The quan- 
tity Day is the probability of the transition of the medium N from the 
ith state into the jth (1, j= 1, 2, ..., k). The probability bss is 
assumed to be constant and unchanging in the course of time. 

If now some automaton A functions in the medium N, then the pairs 
(C,, as) consisting of the itate Cy of the medium N and the state a 
of the automaton A can be selected as the states of some uniform Markov 
chain. The matrix of the transition probabilities of this chain can be 
easily constructed from the matrices of the reaction probabilities of 
the media Cy; Co» ve Cys the matrix B of the switching function and 
output function of the automaton A. 
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It can be shown [82] that in the class K of automata A, with lin- 
ear tactic operating in the described nonstationary medium N there is 
(depending on the choice of the medium N) an optimal automaton ‘iy’ 
having a minimal (in the class K) mathematical expectation of the lim- 
iting value of the penalty at each step of its operation. Thus, in con- 
trast with the stationary meida, for automata with linear tactic ope- 
rating in nonstationary media, it is advisable to increase the volume 
of the automaton memory (number of states) only to a certain limit, 
after which further increase of the memory leads to deterioration ra- 
ther than improvement of the quality of the operation of the automaton. 
§5. THE PROBLEM OF PATTERN RECOGNITION TRAINING 

One of the most significant fields of application of the theory 
of the self-organizing systems is that of the problem of the recogni- 
tion of visual patterns. The recognition of visual patterns and the 
training for such recognition is a brilliant example of the adaptive 
properties of the human brain. The meaning of pattern recognition is 
that the human observer combines ceitain sets of objects or phenomena 
which he observes into a single class, termed the pattern. The patterns 
with which the human being operates are not random combinations of ob- 
jects, but rather those combinations which are related by some common 
properties. Considering basically the visual patterns, we shall in the 
future term the individual objects which compose this pattern images. 

Examples of visual patterns might be the set of all the images of 
a particular letter or digit, the set of the images of all possible 
buildings, etc. By analogy with the visual patterns we can consider 
also the sound patterns (for example, the set of all the pronouncia- 
tions of a particular phonem, the set of all waltzes, etc.) and pat- 
terns of any other nature. In the future we shall limit ovrselves to 
the consideration of only the visual patterns, as examples however, all 
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our theoretical constructions will be applicable not only for the vis-~ 
ual, but also for any other patterns. 

For the following constructions we need first of all the defini- 
tions of the abstract images and patterns. We will assume that the in- 
ages are perceived by some set of sensitive elements — receptors. By 
analogy with the case of the visual patterns perceived by the human 
eye, we term this set the retina. In the case of abstract images and 
patterns we do not need to be more specific on the question of the spa- 
tial arrangement of the receptors. However, in the case of specific 
visual patterns this refinement is useful. In this case we will usu- 
ally assume that the receptors constituting the retina are arranged on 
a plane at points forming a regular grid, i.e., in other words, at the 
points with the coordinates (a + ic, b + jc), where c # O, i runs 
through the set of values 0, 1, ..., m— 1 and Jj runs through the set 
of values 0, 1, ..., n— 1. In the future we shall such a retina a reg— 
ular rectartular (n_ x m)-retina. 

The task of the retina is to convert the image projected onto it 
: into some ensemble of signals of a standard form which ave put out by 

the receptors composing retina. In the future we shall differentiate 
two forms of receptors: the so-called continuous receptors whose out- 
put signals can be any real numbers on some fixed segment of the nun- 
ber line, and the so-called discrete receptors which can deliver only 
two different output signals. Without losing generality we can fix as 
the domain of the values of the output signals of the continuous re- 
ceptors the number segment [0, 1], and as the possible values of th: 
output signals of the discrete receptors we can fix the ends of this 
segment, i.e., the numbers O and l. 

In the case of the visual patterns we will always assume that the 
output signal of the continuous receptor is equal to the brighteness 
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of the image point projected on the given receptor, expressed in rel- 
ative units: zero corresponds to absolutely black points, and unity to 
absolutely white (reflecting 100% of the light incident on them) points 
of the image. For the discrete receptors we establish some brightness 
threshold. The points whose brightness does not exceed the value of 
this threshold will correspond to a zero output signal. More specifi- 
cally, however, in the case of the discrete receptors we shall consid~ 
er only two-tone images consisting either of points of zero brightness 
(background) or of points of unit brightness (the image itself). In the 
future we shall use precisely this latter point of view. 

Absrtact image is the term we shall give to any fixed ensemble of 
output signals of the receptors constituting the retina. If the total 
number of receptors in the retina is N, then, in view of the assump- 
tions made above, the abstract image can be naturally identified with 
some point of an N-dimensional unit cube. In the case of continuous 
receptors all the points of this cube correspond to the images, while 
in tne case of the discrete images only the cube vertices correspond 
to the images. In connection with this, we shall call the N-dimensional 
unit cube (in the case of the continuous receptors) or the set of its 
vertices (in the case of the discrete receptors) the image space. In 
the first case this space 1s continuous, in the second case it is dis- 


N airrerent points). 


crete (consisting of 2 
It is natural to introduce the following "metric" in the image 

space: the distance between two points of this space (1.e., between 

two images) is the square root of the sum of the squares of the differ-— 


ences of the corresponding coordinates of these points 


d= V(x, — xr + (x, —x)) +...-4 (x, —%, . 


The R-neighborhood of any point M of the image space is the ensemble 
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of all points of this space removed from the point M by a distance less 
than or equal to R. 

We note that the definitions introduced are applicable for both 
the continuous and the discrete image spaces. In the second case the 
distance between any two points is the square root of the total number 
n of the noncoincidences of the coordinates of these points. However, 
it is more convenient for us to consider that in the case of the dis- 
crete spaces the distance is this number y itself. Then all the dis- 
tances will be expressed by whole numbers. After the introduction of 
the distance in the image space, we can talk of the closeness of par- 
ticular points to one another. From the intuitive considerations asso- 
ciated with the concept of the pattern, it follows that the images ly- 
ing sufficiently close to a particular image from some pattern must be- 
long to this pattern itself. This circumstance must be somehow taken 
into account in the definition of the comcept of the abstract pattern. 

In the case of the continuous image space, the pattern can be only 
that set of points of this space which together with any point M also 
wholly contains some e=-neighborhood of the point M (the magnitude of 
e depends on the choice of the point M). Sets having this property 
are termed open sets. Thus, in the case of the continuous receptors we 
shall term any open set of the image space an abstract pattern. 

An example of an open set might be the internal portion of a 
sphere having the same dimension as the considered (continuous) image 
space. It is important to once again emphasize that the degree of 
smallness of the changes which can be introduced in an image without 
changing its membership in a given pattern depends on the choice of the 
image itself. The permissible variations for the images located closer 
to the boundary of the pattern (in the example in question the surf'ace 
of the sphere serves as the boundary) are reduced, while they are in- 
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creased for the images sufficiently removed from the boundary. 

In the case of the discrete image space all its subsets will ob- 
viously be open sets and can consequently be considered as abstract 
patterns. For reducing the classes of patterns in the discrete space 
we can make use of the concept of the boundary index of the set. Let M 
be any subset of the discrete image space R which we introduced, and m 
the number of elements of this set. If the set M does not coincide with 
the entire space R, then among its elements there will be those for 
which at a distance from them equal to unity there lie elements not be- 
longing to the set M. Let us term such elements boundary elements and 
denote by my the number of all the boundary elements of the set M. We 
call the ratio m, /m the boundary index of the considered set M. 

It is easy to see that the smaller the boundary index of a set the 
greater the degree to which it resembles in its properties the open 
sets of the continuous spaces: an ever larger portion of the points of 
the set with their leneighborhoods are contained in it. Therefore it is 
natural to state the proposition which Braverman [11] has termed the 
compactness hypothesis: only those sets whose boundary indices are suf- 
ficiently small can serve as patterns in the discrete image space. 

With a more detailed study it is found that this proposition must 
be refined by means of certain additional probability-theoretic con- 
structions. Let R be a discrete image space in which some subset has 
been fixed. Let, further, for each element Xs of the space R there be 
given the probability f (x, ) of the appearance of this element (of the 
image) in some series of experiments of the type of independent trials; 
let f(x, ) be the conditional probability of the appearance of the ele- 
ment Xq with the condition that it belongs to the pattern S. If the 


element Xq belongs to the pattern S, then for any natural number k 


Q(X, ) is the set of all points not contained in the pattern S and re- 
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moved from the element ~ ~~ a distance less than or equal to k. The 
quantity g. (k( = Elsa, is the probability or wrong assignment in 
a following trial of an element of the pattern S which does not belong 
to it as a result of the inclusion in S of all elements lying in the k- 
neighborhood of the element X, in the preceding trial in which some 
element Xy from the pattern S was randomly selected. 

We term the operation of the inclusion in a particular pattern S 
of all elements of the k-nelghborhood of some element x from this pat- 
tern the operation of k-extrapolation with respect to the element x. 
The quantity found above g,, (k) is the probability of the occurrence of 
an error as the result of the operation of k-extrapolation with respect 
to the randomly selected element of the pattern S. A refinement of the 
compactness hypothesis, mentioned above, consist3 in the assumption 
that for every discrete image space R there exists such a number N = 
= N(R) that N > 1, and for all values of k < N the probability g,(k) 
of the occurrence of an error in the result of the k-extrapolation 


does not exceed the negligibly small constant nonnegative quantity e 


for any patterus S. We term this the hypothesis of the N-extrapolata- 
bility of the patterns with accuracy to e. 


The operation of k-extrapolation can obviously also be defined for 
patterns in continuous image spaces. Replacing the summing by integra- 
tion, we can obtain by analogy with the expression for g, (Kk) an expres~- 
sion for the probability of the appearance of an error as the result 
of extrapolation in the continuous case. In exactly the same way as for 
the discrete spaces, we can formulate the hypothesis of the N-extrapo- 
latability of the patterns in the continuous image spaces. 

The pattern recognition problem leads to the need for a precise 
description of the features characterizing this pattern. However, this 
sort of description cannot be given in all cases by any means without 
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overcoming very serious difficulties. Therefore, in practice we usually 
follow the path of constructing algorithms which make it possible to 
accomplish the so-called training for pattern recognition. The essence 
of this training is obtaining the approxirate descripton (or, as cus- 
tomarily phrased, the approximation) of the pattern as the result of’ 
the showing of some set (generally speaking, not all!) of images com- 
posing this pattern. 

Based on the hypothesis on the N-extrapolatability of the pat- 
terns, we can construct the so-called general approximation algorithm 
which makes it possible to accomplish pattern recognition training. 
Just as every algorithm of the self-improving type, the general approx- 
imation algorithm A has two operation periods — the learning period and 
the examination period. 

In the learning period various representations of the patterns 
Rj» Ro» blakary Ry which are to be recognized are applied to the input 
of the algorithm A. In this case the corresponding representations 
(images) are chosen at random (most frequently by the method of inde- 
pendent trials) and are accompanied by the indication: to which of the 
patterns Ry> Ro» ware Ra each of the selected images belongs. All the 
images shown in the learning period are stored and are used in the ex- 
amination regime for the determination of whether the next (also se- 
lected randomly) image r belongs to a particular one of the pattern 
Ry» Ros «+e» Re 

To do this determinations are made of the distances in the image 
space from the image r, first to the representations of the pattern Ry 
selected in the learning period, and then to the representations of the 
pattern Ro» etc., until the succeeding distance determined to some re- 
presentative of some pattern R, (4 = 1, 2, ..., n) is found to be less 
than or equal to the extrapolatability coefficient N. In this case the 
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image r is associated with the pattern R,- If, however, all the dis- 
tances are larger than N, the image r remains unrecognized, i.e., it 
will not be associated with any of the patterns Rj; Ro» Tor Rae 

It is easy to see that if all the patterns Ry» Ro» Fees Rn are 
extrapolatable with an accuracy to ¢« and can be covered with the aid 
of a number of Neneighborhoods (spheres of radius N) which is signifi- 
cantly less than 1/e, then the described algorithm gives a good approx- 
imation of the chosen patterns (with small probability of error on ex- 
amination). 

In practice the different patterns in the image space are not usu- 
ally in direct contact with one another. If we take as the coefficient 
N the minimal distance between patterns, then in the discrete space 
the absence of contact of the patterns means that N > 2. If we also as- 
sume that the probability of the appearance of images not belonginr to 
any one of tne selected patterns R,; Rp. ace | Ry is equal to zero, 
then all these patterns are obviously (N — 1)-extrapolatable with an 
accuracy to ¢ = 0. With these assumptions the general algorithm for 
approximation with the aid of (N — l-neighborhoods obviously leads, as 
a result of a sufficiently long duration of the learning period, to 
an arbitrarily good approximation (with an arbitrarily small error 
probability). 

This general approximation algorithm admits several further in- 
provements in several different directions. First, in addition to the 
examination regime described above we can introduce a second type of 
examination regime. In this case the image appearing in the examina- 
tion regime relates to that one of the patterns Ry» Ro» Raets Ra? which 
contains the representative (memorized in the learning period) located 
closest of all to the image r. For definiteness we assume that if there 
are several such patterns preference is given to that one of them which 
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The second improvement consists in economy of the memory: if the 
Neneighborhood of any image S, appearing in the learning period, is 
completely covered by the N-neighborhoods of the other images, also 


shown in the training period, then the image S can immediately be elin- 





inated from the memory and not used during the examination. In practice 
it 1s advisable to make use of this improvement in a somewhat differ- 
ent modification in which the maximal number of represen’ ations of 

each pattern which can be remembered in the learning period is limited 
ahead of time. For each remembered representation account is taken of 
its relative usefulness. As the criterion of the relative usefulness of 
| any image r we can use, for example, the ration of the number of cases 
, when this image was used for the correct recognition of the following 
images to the total number of images which appeared after memorization 
of the image r. Only those images are subject to memorization which can- 
not be correctly recognized with the aid of the images already avail- 
able in the memory. If, in addition, the memory set aside for the stor~ 
age of the representations of a particuiar pattern is found to be com 
pletely full, then the representation being memorized forces out of 

the memory the representation of the given pattern which has the low- 
est relative usefulness. 

The third improvement involves, along with the "natural" retina 
(onto which the images being recognized are projected directly), the 
use of a new retina whose output signals are suitably selected func- 
tions of the output signals of the first retina. The image space in 
which the patterns are defined is constructed from the output signals 
of the second retina (incidentally, it is more natural to call this 
the feature space rather than the image space). With the aid of a ju- 
diciously chosen transformation of the original space we can consid— 
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erably simplify the problem of pattern recognition training. Commonly 
used, in particular, are those transformations which generate features 
which are not dependent on parallel shift or variation of the size of 
the images. 

“BRinally, we will indicate still another modification of the gen- 
eral approximation algorithm A. The regime in which this algorithm re- 
ceives during the period of its self-improvement certain images with 
an indication of the pattern to which they belong is termed the train- 
ing regime. In many cases, in addition to this regime it is advisable 
to consider the scecalled self-training regime. 

In the self-training regime planned for the formation of n dif- 
ferent patterns, there are first given the n randomly selected images 
Pye Top sees Mys each of which is taken to be the representation of 
some pattern. The image By being reshown is associated to that pattern 
whose representation is located closest to the image By and the number 
of representatives of this pattern is increased. In the following step 
the reappearing image Bo is compared with the augmented number of re- 
presentations and is again associated with that pattern whose presenta- 
tion (any) is located closer to 8, than the representatives of all the 
other patterns. Thereafter the storage proceeds either according to the 
usual scheme or by the scheme described above with replacement (with 
limited memory ). The recognition of the images in the examination re- 
gime can te performed by the two methods described above: either by N= 
extrapolation of the patterns, or by the method based on the determi- 
nation of the shortest distance. 

This algorithm can be used in the case of both the discrete and 
continuous image spaces. The approximation method used in it is unique, 
of course. Thus, rather than approximation by the spherical neighbor~ 
hoods we could use neighborhoods of any other shape for the approxima- 
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tion. Successful results have been obtained with approximation of the 
patterns by regions bounded by hyperplanes (see, for example, Braveman 
(11}). Various modifications of the metric described above are also 
possible in the image space, which obviously leads to alteration of 

the concept of the spherical neighborhoods: neighborhoods which are 
spherical in one metric may not be so in another metric, and vice versa. 

We shall describe several other, more specialized algorithms for 
pattern recognition training which have been used successfully by var~ 
fous authors. One of the simplest, although not extremely effective, 
algorithms of this sort is the so-called perceptron of Rosenblatt [69]. 
Just as every device for the recognition of patterns, the perceptron 
contains a set of receptors — the retina. In the future, without spe- 
cially stipulating this each time, we shall consider only regular rec- 
tangular retinas. Depending on the nature of the receptors, the per= 
ceptrons are divided into the continuous perceptrons (with continuous 
receptors) and the discrete perceptrons (with discrete binary recep- 
tors). 

In addition to the receptors, each perceptron contains two other 
forms of element, termed A-elements and R-elements. 

The A-elements are simplified models of the neurons. In this con- 
nection we shall hereafter term them simply neurons. In accordance with 
the nature of the receptors used in the perceptron we differentiate 
the continuous and discrete neurons. Both types of neurons have two 
forms of inputs, termed stimulating and inhibiting. Each neuron has a 
finite number of inputs and a single output; in addition, with it there 
is associated some real number, termed the weight of the given neuron. 
As the domain of the values of the neuron weights we take the set of 


all real numbers, regardless of which neurons we are considering — con- 


- 264 - | 


tinuous of discrete. 



































baw - lie - : _ wad = 


In addition to the weight, the number of stimulating and the numn- 
ber of inhibiting inputs, the neuron is also characterized by its funce-_ 
tioning law, which determines the output signal of the neuron as a 
function of its input signals and weight. We must keep in mind that the 
inputs of all the neurons in the perceptron are connected to the recep- 
tors, so that the signals generated by the receptors serve as the input 
signals for the neurons. 

The continuous neurons, first considered by Rosenblatt [69], had 
a functioning law of the form z = v(ix — Zy), where z is the output 
signal, v is the neuron weight, x is the sum of the signals apnlied 
to the neuron through the stimulating inputs, and Zy is the sum of the 
signals applied to the neuron through the inhibiting inputs (x, y, v 
and z are arbitrary numbers). 

The functioning law of the discrete neurons is normally specified 
by the indication of some whole rational number p termed the neuron 
triggering threshold, or simply threshold. If the algebraic sum =x — iy 
of the stimulating and inhibiting input signals is less than the 
threshold, then the neuron is considered unstimulated and delivers an 
output signal equal to zero. When the sum 2x — Zy reaches and exceeds 
the threshold, the neuron is stimulated and delivers an output signal 
equal to its weight v (regardless of the magnitude of the amount by 
which the sum of the input signals exceeds the threshold). 

It is convenient to characterize the discrete neurons with the 
described functioning law by means of three whole numbers (k, 2, p), 
the first being equal to the number of stimulating inputs, the second 
to the number of inhibiting inputs, and the third to the threshold lev—- 
el, In the following considerations the neuron weight will always be 
@ variable quantity and therefore we shall not introduce it into the 
neuron characteristic. Discrete neurons of the indicated type, having 
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the same characteristic three numbers (k, £, p) will be associated with 
the same type, regardless of possible differences of their weights. 

Hereafter it is assumed that all the neurons of any given percep- 
tron designed for the differentiation of k different patterns, the. set 
of all the neurons is partitioned into k disjoint groups (subsets) lo- 
cated in one-to-one c*rrespondence to the pattern being distinguished. 
For brevity we shall term the neurons belonging to the group correspond- 
ing to the ith pattern the neurons of the ith pattern (1 = 1, 2, ..., 
von 7 WS 

The inputs of each neuron in the perceptron are connected to the 
receptors of the retina. Here it is assumed that the different inputs 
of the same neuron are connected to different receptors. The outputs of 
the neurons are connected to special summators termed R-elements, with 
the outputs of the neurons of the same pattern connected to the same 
summator, termed the suwmmator of this pattern. 

The output signal of the summator of any given pattern is equal to 
the sum of the weights of all the stimulated neurons of this pattern. 
If none of the neurons of the pattern being considered is stimulated, 
then the output signal of the corresponding summator is taken equal to 
zero. The final output signal of the entire perceptron is considered 
to be that pattern whose summator has the highest output signal. In the 
case when the maximal value of the output signal is attained simulta- 
neously by the summators of several patterns, the output signal of the 
perceptron is considered to be undefined. 

Taking as the input signal of the entire perceptron tie. image 
being projected on its retina, we obtain as the reaction of the per- 
ceptron to this signal that pattern to which the perceptron relates 
the given image. It does not follow at all, of course, that the consid- 
ered perceptron accomplishes the proper classification of the images 
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in accordance with an & priori specified division of the set of images 
into different patterns. This initial division is specified by the ope- 
rator. We shall term it the original (or a priori) classification of 
the images in contrast with the actual classification accomplished by 
the chosen perceptron. 

Therefore it is necessary also to specify some process of varia~ 
tion of the perceptron characteristics which permits approach of the 
actual classification performed by the perceptron to the original 
classification as we show the perceptron various images. This process 
is specified with the aid of the indication of the so-called encourage- 
ment __law. 

As the basic encouragement law for the discrete perceptrons we 
shall choose the somewhat generalized encouragement law in the so- 
called a-systems which were considered by Joseph [34]. This law, which 
we shall term the (generalized ) a~-law, is completely characterized by 


‘ the specification of two nonnegative constants a and b, not simulta- 





neously equal to zero. The meaning of this encouragement law consists 

i in the weights of some neurons being increased by an amount equal to a 
and the weights of the others being decreased by an amount equal to b 
after each showing of a succeeding image to the perceptron (the en- 
couragement law in the Joseph a-systems is obtained from the general- 
ized a-law in the case when a = 1, b= 0). 

We differentiate two regimes of functioning of the perceptron with 
generalized a-law encouragement. The first regime, termed the training 
regime, consists in the encouragement (increase of weight by the amount 
a) of all the stimulated neurons of that pattern to which the image be- 
ing considered in the given step belongs, and in the penalizing (re- 
duction of the welght by the amount b) of all the stimulated neurons 
of the remaining patterns. It is clear that the correct pattern to which 
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the given image belongs must be indicated by the huwnan teacher, since 
only he knows the original a priori classification of the images. 

The second regime, termed the self-training regime, differs from 
the training regime only in that the determination of the patteren to 
which the image being considered belongs is accomplished by the per- 
ceptron itself — this pattern is taken to be that pattern which actu 
ally was delivered by the perceptron in response to the showing of the 
given image. Of course, here there is no guarantee that the response 
delivered by the perceptron will be correct (in the sense of the orig- 
inal classification of the images). However, with observation of cer— 
tain conditions, in the case of unlimited ncrease of the number of 
steps in the self-training process the process can sometimes reproduce 
the original classification of the images. 

In addition to the (generalized) a-law encouragement in several 
cases it is advisable to consider two other laws, which we shall term 
respectively the (generalized) B-law and the (generalized) y-law. Both 
of these laws retain the priciple of encouragement and penalizing which 
is used in the (generalized) a-law. In addition to this, in the p- 
law at each step (in both the training and self-training regimes) there 
is a reduction of the weight cf all the neurons (both stimulated) by 
an amount which is directly proportional to their weights, with a pro- 
portionality coefficient 6 which is the same for all the neurons. In 
the y-law there is performed an additional (to the operations of the 
a-law) variation of the weights of all the neurons (both stimulated and 
unstimulated) by the same amount, selected at each step so that the 
sum of the weights of all the neurons is always equal to zero. 

In the case of the continuous neurons the generalized a-law of 
encouragement consists in that any neuron of the correct (a priori or 
from the point of view of the perceptron) pattern increases its weight 
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by the value of the product g of the constant a by the combined input 
signal of the neuron: q = a(ix — Sy). Similarly, the weights of the 
neurons of all the remaining patterns are reduced by the amount b(=x — 
— Zy) (individually for each individual neuron). The additions which 
differentiate the B- and y—laws remain the same as in the discrete case. 

In the construction of the theory of perceptron trai: ing and self- 
training it is frequently advisable to consider not the individual per- 
ceptrons, but certain classes of perceptrons. A perceptron class is a 
set of perceptrons which can differ from one another only in the meth- 
od of connection of' the neurons with the receptors and the initial 
weights of the neurons. All the remaining characteristics of the per- 
ceptrons belonging to a particular class are assumed to be the sane. 
These characteristics include the form of the receptors and neurons, 
the total number of receptors and the structure of the retina, the set 
of images and the set of patterns, the original classification of the 
images (their distribution over the patterns), the number of neurons 
of each pattern, and, finally, the ercouragement law. 

The method of connecting the neurons with the receptors and the 
initial weights of the neurons are considered random and are charac-- 
terized (within the limits of the selected class) by certain distribu- 
tion laws. In other words, the class of perceptrons is considered not 
as an abstract set of perceptrons, but as a set with specified proba- 
bility field which determines the probability of the selection of a 
particular concrete representation of the class being considered. We 
can thus consider that the specification of the class defines some 
random perceptron. 

The initial weights of the neurons are usually considered to be 
independent random quantities having the same distribution law. In the 
same way the method of connection of each neuron with the retina is as- 
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sumed to be independent of the connections of the remaining neurons. 
To each possible method of connection of an individual neuron with the 
retina there is associated the probability of this method, common for 
all the neurons. Here the connection cf all the neurons of the percep- 
tron (from the perceptron class being considered) to the retina is 
treated as a series of independent trials, characterized by the indi- 
cated probebilities. 

Combining the probability characteristic for the method of cone 
necting the neurons with the retina with the distribution law for the 
initial weights of the neurons, we arrive at the desired distribution 
law in the class of perceptrons. One of the most frequently encountered 
distribution laws is obtained when all the initial weights are deter- 
mined and are equal to the same number (most frequently zero), and the 
connection of all the inputs of any given neuron is accomplished inde- 
pendently of one another on the basis of a particular distribution law 
(most frequently uniform) specified directly on the retina. 

In the construction of the perceptron training theory we must con- 
sider the so-called training sequences and the classes of training se- 
quences. The training sequence is simply a finite sequence of images, 
shown to the perceptron one after another in the process of its train- 
ing or self-training. The total number of images shown (including rep- 
etitions) 1s termed the length of the training sequence. A class of 
traning sequences is the set of &11 sequences of the same length in 
which there is given the distribution law which defines the probability 
of the selection of any given sequence of the considered class. 

Most frequently this distribution law is obtained with the assign- 
ment of a definite value of the probability of the appearance of any 
image from the considered set of images at each step of the training, 
where we usually consider the case when these probabilities are iden- 
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tical in all the steps, i.e., when the training sequence is a series 

of independent experiments on the selection of the images with constant 
probabilities assigned to each image. Hereafter we shall limit ourselves 
to this case. 

The effectiveness of the training in a given class A of percep- 
trons with the aid of the given class B of training sequences is defined 
as the probability of correct recognition of the next image p applied 
to a perceptron randomly selected from the class A after the prelini- 
nary application to it of a training sequence randomly selected from 
the class B. We differentiate two forms of effectiveness. The so-called 
total effectiveness of training is obtained when the image p is se- 
lected at random (with the a priori fixed probabilities of the appear- 
ance of the various images used in the establishment of the distribu- 
tion law in the class of training sequences:). The training effective- 
ness with respect to the single image g is obtained when the next image 
presented to the perceptron to recognize is precisely the image q . If 
the probabilities of recognition errors are the same for all the images 
then the total training effectiveness will obviously co:ncide with the 
inidvidual effectiveness of training with respect to any image. 

In the following section we shall undertake the theoretical study 
of the training effectiveness for discrete perceptrons witi a-law en- 
couragement. For the moment, we note that experiments have shown that 
the training effectiveness in all the types of perceptrons described 
above is relatively low. Therefore in the algorithms for pattern rec- 
ognition training which are used in practice there are normally intro- 
duced several additional improvements in comparison with the perceptron 
scheme. For example, in the scheme of the Roberts' adapt [67], on the 
whole quite similar to the scheme of the perceptron with a-law encour- 
agement, a considerable improvement of training effectiveness 1s 
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acheived by preliminary normalization of the image (1.e., in other 
womis, by automatic shift of the image to the center of the retina and 
its reduction to a standard size). Methods are also used for the pre- 
liminary processing of features, schemes of multi-stage perceptrons, 
etc. 

The deficiencies of the perceptron and the ways of removing them 
will be clearer after acquaintance with the following two sections in 
which we consider some questions associated with its behavior in the 
training and self-training regimes. 

§6. THEORY OF TRAINING OF DISCRETE a-PERECPTRONS 

In the present section we shall note the basic outlines of the 
theory of perceptron training. Here we shall limit ourselves to the 
consideration of only the discrete perceptrons with generalized a-law 
encouragement operating in the training regime (and not self-training!), 
without special stipulation of this circumstance in eack case. In this 
case the training theory is more simple and transparent, since it is 
possible to follow not the functioning of each individual neuron, but 
to limit ourselves to the cmsideration of only certain integral char~ 
acteristics. 

In the variant of the theory which we assume, this integral char- 
acteristic is the so-called characteristic tensor of the perceptron. 
We immediately emphasize that the use of the term "tensor" in this case 
is not related with any patterns of transformation of its component 
with variation of the coordinate system, but serves only as the name 
for a certain integral table with three inputs. For the description of 
this table we introduce a definite numeration of all the images which 
are being presented to the considered perceptron by the numbers from 
1 to m and the numeration of all the patterns into which these images 
are subdivided by the numbers from 1 to g. Then the characteristic 
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tensor of the perceptron will be the ensemble of components Cae where 
the indices 1 and j run through the values from 1 to m and the index 
k runs through the values from 1 to g. ae denotes the number of neu- 
rons of the kth pattern which are stimulated by both the ith and the 
jth images. 

The characteristic tensor of a class of perceptrons is defined 
essentially the same way. The only difference is that its components 
“ie will in this case be not determinate, but random quantities whose 
distribution laws are determined in an obvious fashion by the distri- 
bution law which characterizes the method of connection of the neurons 
to the retina. 

These definitions imply the validity of the relation 


Th Tt (f= 1,2....m; Rae 12.6.9). (76) 

It is also clear that any "diagonal" element of the tensor, Wie 

for example, is the number of neurons of a particular (the kth in the 
present case) pattern which are stimulated under the action of the ith 


image. This implies the validity of the inequaltiy 


TH>TH (fal... R=... g). (77) 
We shall term a perceptron or class of perceptrons symmetrical if 


the components of the characteristic tensor do not depend on the upper 


index, i.e., if the following relation is valid 


Tit TH (hv ky = 1,2,..0095 fe 2s. mp. (78) 

In this case the upper index is redundant so that it is natural 
to characterize the symmetrical perceptrons and the classes of percep-— 
trons not by the three-input (Ty j) but by the two-input table (1,,) 
where Th3 = i = 7 =... = Ty which we shall term the character- 
istic matrix of the perceptron (or class of perceptrons). 


Let us introduce still another notation. For any (finite) se- 
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quence of images ¢ we use Ut (4) to denote the output signal of the 
sunmator of the kth pattern wnich is induced in the considered per- 
ceptron by the ith image shown after training of the perceptron with 
the training sequence fg. We use Ur to denote the corresponding signal 
prior to the beginning of the training process, i.e., in other words, 
the signal Ur (2) for the case when the training sequence ¢ is empty 
(has a length equal to zero). 

The quantities Uy (2) will obviously be determinate in the case of 
the selection cf a particular perceptron and random in the case of the 
consideration of a class of perceptrons. Sometimes it is advisable also 
to consider the sequence ¢ as a random sequence, running through the 
class of training sequences. 

A distinctive feature of the a-law for perceptron training is the 
unique property of commutativity of the training process expressed by 
the following proposition. 

Theorem 1. In the perceptron (or in a class of perceptrons) with 
a-law encouragement the output signal Uy (4) of the summator of the kth 
pattern does not change under the action of the ith image after train- 
ing with any sequence f if in the sequence ¢ there is performed an ar- 
bitrary permutation of the images composing it. This is valid for any 
image 1 and any pattern k. 

Actually, the input signals of the neurons of the kth pattern 
which are induced by the ith image will obviously not be altered in 
the training process, so that they remain the same after showing of 
the training sequence ¢ and any other sequence ¢'. Thus, the variation 
of the output signal of the summator in the training process is due 
only to the change of the weights of the neurons. As a result of the 
definition of generalized a-law encouragement, the variations of the 
weights of the neurons with the showing of any image in the training 
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process (but, generally speaking, not in the self-training process) do 
not depend on the place which the given image occupies in the training 
sequence. Since the overall increment of the weight of any neuron in 
the training process is equal simply to the sum of the increments at 
each step of the process, the validity of theorem 1 is thereby com- 
pletely proved. 

By the use of theorem 1 we can characterize any training sequence 
£ by the integral vector Vv = (v.> Vos sees Vin) whose ith component 
(for any i= 1, 2, age) is equal to the number of occurrences of the 
ith image in the sequence f. Let us call this vector the characteristic 
vector of the sequence g. The class of training sequences can also be 
specified with the aid of the characteristic vector. However, the com— 
ponents of the vector in this case will be, generally speaking, not de- 
terminate, but random quantities. 

For the description of the perceptron training process with (gen- 
eralized) a-law encouragement it is sufficient to specify the original 
perceptron (or class of perceptrons) by only its characteristic tensor 
(tt 5) and the matrix of the initial signals of the pattern summators 
(Uy) (14, J=1, 2, ..., m3 k=1, 2, ..., m). The training sequence 4 
(or class of training sequences) is specified by its characteristic 
vector (v,, Vos sees Vn) In the general case all the quantities ee 
Us Vs will be random. However, most frequently we consider various 
particular cases when certain of the indicated quantities are deter-— 
minate. We note that we will usually include an indication on the se- 
leaction of the image set and the training sequences in the definition 
of the perceptron. 

In order to avoid confusion of the images and the patterns, we 
shall designate the patterns by Latin letters and the images as before 
by their numbers. Let us consider the question of the determination of 
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the output signals of the pattern summators Uy (4). It is easy to see 
that with the use of (generalized) a-law encouragement the quantity 
Us (4) is represented in the form of the initial signal Us and the in- 
crements of the weights of all the neurons of the Pth pattern which 
are stimulated by the ith image all the steps of the training process. 

Characterizing the class of the training sequences by the charac-— 
teristic vector (v,, Vos cess Vin)? it is not difficult to find the ex- 
pression for the overall increment of the quantity vi (2) obtained as 
the result of vy showings of the jth image. It follows from the defini- 
tion of (generalized) a-law encouragement with the constants a and b 
that with each showing of the jth image any neuron of the Pth pattern 
which stimulates this image will increase its weight by the amount a 
if j ¢€ P, and will reduce its weight by the amount b if j € P. The total 
number of neurons of the Pth pattern which participate in the forma- 
tion of the output signal vi (2) and which stimulate the jth image is 
clearly equal to Thy Thus, the total increment of the magnitude as the 
result of Vy showings of the jth image 1s expressed by the formula 
atl V4» if j ¢ P, and by the formula bth 4V, if j € P. Therefore the 
following proposition is valid. 

Theorem 2. Let there be given the discrete perceptron (or class 
of discrete perceptrons) with the characteristic tensor TS and the 
matrix of initial output signals of the pattern summators Ui (4, j=1, 
2, .«+, m3 P € R). If in the considered perceptron (class of percep= 
trons) there operates the (generalized) a-law encouragement with the 
constants a, b, then after training with the sequence (or class of se- 
quences) 4 with the characteristic vector (vj, Vor sees Va) for any 
pattern P and any image i the output signal Uy (2) of the summator of 
the Pth pattern under the action of the ith image is expressed by the 
equation 
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UP (lt) = UP +a STi — OL TIP, (79) 
jeP ieP 
For any image 1 we shall use Py to denote that pattern to which 
the image 1 belongs in the original classification of the images. Using 
this notation, it is not difficult to write out the necessary and suf- 
ficient condition for the perceptron to correctly classify the ith 
image after training. This condition will obviously be the satisfac- 


tion of the inequality 
Ul! ()>Ur () for all Pm& Py. (80) 


Using relations (79) and (80) it is not difficult to calculate 
the perceptron training effectiveness in any specific case. These re- 
lations take a particularly simple form in the case of the symmetrical 


ri P 


perceptrons. Actually, since in this case Thy = Thy e Ty? relations 


(79) and (80) can be written in the form of the system of inequalities 


Ui + aXLT,o,—b% 7,0, > UP +aL7,0,— 6 27,0; (81) 
jeP; jeP 


jeP; jeP 
In inequality (81) terms of the form baT, Vy) for which j is not 


contained in either Py nor P, appear in both the left and right sides 
and therefore cancel one another. After their exclusion we obtain the 


simpler relations equivalent to relation (81): 


Ui +(a +6) D Tp; >U? + (a +6)L T,,v, aan weex P#P,. (82) 
leP, jeP 


Inequalities (82) give the necessary and sufficient conditions for 
the correct classification of the ith image by a symmetric perceptron 
with the characteristic matrix P45 and the initial signals of the 
pattern summators Uy after training with the sequence having the char- 
acteristic vector (v,, Vos s+ vi): These inequalities can be simpli- 
fied still more for the perceptrons with symmetrical initial condi- 
tions, 1.e., those percpetrons (or classes of perceptrons) for which 
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the conditions 
UP muy «for all i=l,2,...,m (83) 
are satisfied for all P and Q. 

Using relations (83) and recalling that as the result of the de- 
finition of the generalized a-law a + b # O, we come to the following 
result. | 

Theorem 3. Let there be given any discrete symmetric perceptrons ) 
with the characteristic matrix P45 and with symmetric initial condi- 
tions in which there operates (generalized) a-law encouragement. Then 


the necessary and sufficient conditions for the correct recognition by 


the perception (class of perceptrons) of any ith image after training 
with the sequence (class of sequences) with the characteristic vector 


Vy2 Vos eors Vin) is expressed by the relations 


x 7>U%% for all P # Be: 
leP; 1eP 


Corollary: training effectiveness in symmetrical discrete percep- 
trons with symmetric initial conditions with performance in them of 
(generalized) a-law encouragement does not depend on the selection of 
the (nonnegative) constants a and b which characterize the law. 

Thus, in the study of the symmetric discrete perceptrons with 
symmetric initial conditions we can without losing generality use con- 
ventianal a-law encouragement with the constants (1, 0) rather than 
the generalized a-law with the constants (a, b). 

In specific calculations of training effectiveness in classes of 
perceptrons it is usually assumed that all the neurons are connected to ; 
the retina independently from one another, and the probability a4, of 
such a connection of the neuron that it will be stimulated by both the 
ith and the jth image is the same for all the neurons with any fixed 


values of i and j. 
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If we use mP to denote the total number of neurons of the Pth pat- 


tern, then the component wy of the characteristic tensor of the class 
of perceptrons being considered can be treated as the number of occur- 
rences of some event having the probability yy with mP independent 
trials. As a result of theorem 2 from §2 of the present chapter, the 
mathematical expectation E(TY,) and the variance D(TY ,) of the random 


quantity ri are expressed by the equations 


E(T?) =T?a,; D(T}) = T?a,,(1 —4,,) (85) 
(i,j == 1,2,...,m; PER). 

With sufficiently large values of T’ the quantity oe itself can 
be considered normally distributed. We note also that in the case of 
the symmetrical perceptrons the quantities pP will be equal to one an- 
other for different patterns P. Therefore we shall denote them simply 
by the letter T, dropping the index P. We shall term the matrix Ila 5 II 
the basic’ probability matrix of the class of perceptrons being cons id- 
ered. 

Similarly, the class of training sequences K which are formed with 
the aid of the random selection of an image at each training step, re- 
gardless of the images selected in the remaining steps, can be char- 
acterized by the probability vector (8, , Bos +++ B.) of the class be- 
ing considered. For any 1=# 1, 2, ..., m the ith component By of this 
vector is equal to the probability of the selection of the ith vector 
as the image being shown at any given step of the training. In this 
case the ith component v, of the characteristic vector of the class K 
4s the number of occurrences of the event having the probability B, 
with N independent trials, where N is the length of the training se- 
quence of the class K (according to the definition of the class of 
training sequences, all the sequences occurring in the class have the 
same length). 
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With sufficiently large values of N, for any image 1 the quantity 
V, can be considered normally distributed and its mathemetical expecta- 
tion and variance are given by the equations 

E(v,) = NB; D(v) ~ NB(I—B) (= 1,2,..-.m). (86) 

We note that the random quantities Vas and also the random quan- 
tities Ti are not, generally speaking, independent for various val- 
ues of i and Jj, which creates additional difficulties in the calcula- 
tion of the probability of correct operation of the perceptron using 
equations (81) and (84). However, in many cases we can avoid these 
difficulties by the introduction of certain additional propositions. 

We shall demonstrate this situation using several examples. 

Example _1.:We consider the discrete perceptron A with neurons of 
the (1, 1, 1,) type, having a regular square (n x n)-retina and 2n 
images, which are chosen to be n horizontal lines of length n combined 
into the pattern P, and n vertical lines of length n combined into the 
pattern Q. All the images have the same probability (equal to 1/2n) of 
appearing in the training sequence. We assume that the perceptron A is 
complete. This means that in both the neuron set of the Pth pattern and 
in the neuron set of the Qth pattern for any method of connection of 
the neuron to the retina there is precisely one neuron having exactly 
the same connection with the retina. In the perceptron A there operates 
a-law encouragement with the constants a and b and the initial weights 
of the neurons are equal to zero. 

We are required to find the training effectiveness of the percep~ 
tron A in the class of random training sequences of length 2N contain- 
ing precisely N showings of the images of the first image pattern and 
N showings of the images of the second pattern. 

Solution. The perceptron will obviously be summetrical and will 
therefore be completely characterized by its characteristic matrix 
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[IZ 5 Il. It is easy to see that the neuron is stimulated by the ith 
image (vertical or horizontal line) if and only if its stimulating in- 
put is connected to the receptor lying on the corresponding line and 
its inhibiting input is connected to the receptor lying away from this 
line. For any given i there are in all n(n* -n). n°(n — 1) differ- 
ent connections of this sort. In view of the completeness of the per- 
ceptron A, the following equation is valid [formula (87)] 

T,, =m (n—1) (i= 1,2... .. 2). (87) 

Let us assume that the numbers from 1 to n designate the horizon- 
tal lines (images of the pattern P) and the numbers from n +1 to 2n 
designate the vertical lines (images of the pattern Q). By analogy with 
the way the expression for Tha was found, we find two more expressions 

Ty = 9 (88) 

if 1 and j are images of the same pattern; 
T,, = (n— 1), (89) 

if 1 and j are images of different patters. 

Using E to denote a unit matrix of nth order and D to denote a 
square matrix of order n, all the elements of which are equal to unity, 
we represent the characteristic matrix M of the perceptron being con- 


sidered in the form 


nMiin—I1)E (n—1)*?D 


(n—1D nt(n— 1)El (90) 











Let (v,; Vos see9 Von) be the characteristic vector of the class 
of training sequences being considered. As a result of the assumed con- 
dition, the components of this vector satisfy the condition 

Oy + On te. On Unge + Onset... + Ome N. (91) 
As the result of theorem 3, we write the necessary and sufficient 


condition for correct recognition of any given image i 


b To, > y T 
leP; ler; (92) 
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or, taking account of relations (87)-(89) and (91), 
nt (n— 1)u, > (n— IPN. (93) 
We write relation (93) in the equivalent form 


4>(q—ar) (94) 

The probability of the appearance of the ith image in each of the 
N showings of the represéntations of the pattern P, is equal to l/n. 
Therefore for the mathematical expectation and the variance of the 
quantity Vv, we obtain the expressions 


E@=4n; Den (1-5): (95) 


In view of theorem 3 from §3 of the present chapter, with suffi- 
ciently large N the probability q, of satisfaction of inequality (94) 
can be calculate’ from the equation 

& s 
9, 0,5 + —! [« "as (i= 1,2,...,.2n), 

Vin | (96) 
where k is the value of the ration of the modulus of the difference of 
the right side of inequality (94) and the mathematical expectation 
E(v,) to the mean square deviation of the quantity v,, equal to the 


square root of the variance. In other words, 
a eae ck Vue See ewer og 
k a Ae x) V wi) feed =/ > (97) 
4 \ ° 


Since the value of ay does not depend on i, it coincides with 
the probability g of correct recognition by the perceptron A of any 
randomly selected image after the preliminary showing of the randomly 
selected training sequence of length ON from the class of sequences be- 
ing considered. 

The value of the probability g is just the value of the overall 
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effectiveness of the training of the perceptron A in the given condi- 
tions. We present the table of the values of the probability gq for 


several values of k: 
Thus, in order to reduce the probability of 


errcer of the perceptron being considered to 0.1% 


with random selection of the training sequence it 





eee eee is necessary to make use of sequences of very great 
|*| 841 0.977 0.960 length, equal to 18 n>, At the sane time, we see 
immediately from inequality (92) that we can re- 
duce this probability to zero (obtaining absolutely accurate recogni- 
tion) as the result of showing each image exactly one time, i.e., with 
the use of a sequence having a length of only 2n. This example gives a 
striking demonstration of the inadvisability of the use of random train- 
ing sequences. At the same time it indicates the serious differences of 
the learning mechanism described from the learning mechanism realized 

in the human brain. 

Actually, the latter mechanism has a marked capability for ex- 
trapolation of experience, i.e., for correct recognition of images 
which never appeared in the training process. At the same time the per- 
ceptron described in the example considered does not give a final guar- 
antee of correct image recognition (with random organization of the 
training process) even when the average number of displays if each 
image reaches a very large number (of the order of n°), 

This conclusion is associated, or course, to a certain degree with 
the specific nature of the example. However, it is not difficult to 
note that with purely random connection of the (1, 1, 1)-neurons to 
the retina (excluding the connection of both inputs of the neuron to 
the same receptro) the mathematical expectation of the components of 
the chavacteristic matrix will differ from the components of the char- 
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acteristic matrix of the complete perceptron only by a constant factor 
which is not significant from the point of view of the calculation of 
the training effectiveness. Therefore, with the random connection of 
the neurons to the retina the most probable behavior of the resulting 
perceptrons will be precisely that of the complete perceptron described 
above. 

Thus, the random organization of the connections of the neurons 
with the retina cannot, generally speaking, provide a high quality of 
perceptron functioning. From theorem 3 it follows that the capability 
of the perceptron for extrapolation of experience is increased with in- 
crease of those components of the characteristic matrix whose indices 
belong to the same pattern, and with reduction of those components 
whose indices belong to different patterns. 

We shall say that a perceptron has absolute capability for extra- 
pelation if for any pattern P and any image 1 from this pattern train- 
ing by any sequence containing the image i not less than one time will 
lead to a correct recognition of all the images of this pattern. We 
obtain the following result. 

Theorem 4. In order that a discrete symmetric perceptron with sym- 
metric initial conditions in which (generalized) a-law encouragement 
operates have absolute capability for extrapolation it is necessary 
and sufficient that all the components Ty; of the characteristic matrix 
of the perceptron whose indices belong to the same pattern be nonzero, 
and that all the components Tj whose indices belong to different pat- 
terns be equal to zero. 

Actually, let us assume that the condition of the theorem is sat- 
isfied. Then inequality (84) will be valid, if for any one of the image 
J from P, the value of V5 1s nonzero. As the result of theorem 3, this 


means that the perceptron being considered has absolute capability for 











extrapolation. 

Let us assume thatthe condition of the theorem is not satisfied. 
This leads to the consideration of two cases: 1) for some pattern Q 
there is a pair of images 1, j belonging to it such that Ty, = 0; 2) 
there is a pair of images k, r belonging to different patterns and such 
that T,., # 0. In the first case, as a result of theorem 3 the learn- 
ing sequence compose exclusively from the images j does not lead to 
correct recognition of the image i. In the second case, let us consid- 
er the learning sequence composed of one image k and any number ve 
larger than Thi/Tep of images r, Then, in application to the recogni- 
tion of the image k the substitution of the indicated values in in- 
quality (84) leads to the inequality Ti > Tp Vp: In view of the se- 
lection of ve this inequality is not valid, which as a result of the- 
oren. 3 means the impossibility of correct recognition of the image k. 
Consequently, in both cases the perceptron will not have absolute capa- 
bility for extrapolation, q.e.d. 

Usually the images belonging to the same pattern are numbered us- 
ing sequential whole numbers. In this case it is natural to partition 
the characteristic matrices of the symmetric perceptrons into cells 
corresponding to the different patterns. Absolute capability for ex- 
trapolation is achieved in this case when these matrices are cellulary 
diagonal and the diagonal cells do not contain zero elements. This 
form the cahracteristic matrices is not always completely achievable, 
however any good approximation to it will require, as a rule, avoid- 
ance of the completely random connection of the neurons with the re- 
tina. The effect obtained as a result of this deviation from random 
connection is best demonstrated using an example. 

Example 2. Find the training effectiveness of the perceptron B, 
differing from the perceptron A of example 1 only in that it retains 
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only those neurons, both ends of which are connected to the receptors 
lying either on one horizontal or on one vertical line. The training 
conditions are the same as in example l. 

Solution. The perceptron B, just as the perceptron A, will obvi- 
ously be symmeteical. It is not difficult to find that the elements of 
its characteristic matrix are given by the relations T,, = n(n — 1); 
Ty; = O (1, j = 1, 2, ..., Qn; 1 # Jj). The condition of correct recog- 
nition of the ith image is expressed by the condition Ta4¥4 > 0 or, 
what is the same, Va > 0. In other words, for the correct recognition 
of the ith image it is necessary and sufficient that it was shown at 
least once to the perceptron in the process of its training. 

With N random displays of the images of one pattern, the probabil- 
lity of the nonappearance in the training sequence of the ith image is 
obviously equal to (1—L)"e% and the overall effectiveness of the 
training is expressed Ris tae number i—e-%, . In order to reduce the 
probability of incorrect operation of the perceptron to 0.1%, as was 
done in example 1, it is sufficient to set N = 7n, or, in other words, 
to use a training sequence of length l4n. We recall that in the first 
esample the same training effectiveness was obtained only by using a 
training sequence of length 18n>, 

It is curious that such a sharp increase of the training effective- 
ness is obtained not as a result of more complication, but as a result 
of the simplication of the perceptron, since the perceptron B is ob- 
tained from perceptron A by discarding a large number of neurons poorly 
connected to the retina. It is easy to find that the total number of 
neurons in the perceptron A is en° (n= — 1) while that in perceptron B 
is only 4n(n — 1). This situation once again indicates the imperfection 
of the perceptron learning mechanism and its significant difference 
from the learning process which take place in the human brain. 
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Let us consider another exampoe of the computation of the learn- 
ing effectiveness in a class of perceptrons. 

Example 3. Determine the training effectiveness of the class C of 
discrete symmetric perceptrons with symmetric initial conditions sub- 
ject to generalized a-law encouragement. The retina, patterns and images 
are the same as in example 1. The number of neurons of each of the two 
existing patterns is equal to N. The inputs of all the neurons are con- 
nected independently of one another with equal probability to any re- 
ceptor of the retina, excluding only the case of simultaneous connec- 
tion of both inputs of a neuron to the same receptor. The training se- 
quence contains each of the 2n images exactly once each. 

Solution. It is easy to see that the components Th; of the char- 
acteristic matrix of the class C, in which the indices i and j are dif- 
ferent images of the same pattern, are equal to zero. The condition 
for correct recognition of the ith image, given by theorem 3, 1s writ- 


ten in our case 


T>L Typ (98) 


1SPy 

It is easy to see that the set My of neurons of the same pattern which 
are stimulated by both the ith and the jth image with different Jj, dif- 
fering from i, are disjoint. All these sets are contained, of course, 
in the set My4° Since Th3 is just the number of elements of the set ; 
My 59 for the satisfaction of inequality (98) it is necessary and suf- 
ficient that among the neurons of the pattern Py there be at least one 
neuron which is stimulated by the ith image but is not stimulated by 
any image of the opposite (different fron P,) pattern. 

From the geometry of the images it follows directly that this con- 


dition is satisfied by the neurons both of whose inputs are connected 


to the same vertical (if 1 1s a horizontal line) or to the same hori- 
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zontal (if 1 is a vertical line). For any fixed 1 from the total nun- 
ber of diffe:rent connections this condition is satisfied only by 

n*(n® ~1) connections. The probability of the desired connection is 
therefore equal to n(n — 1)/n®(n® — 1) = 1M(n + 1) and the probability 


that such a connection will not take place for any of the N neurons N 





neurons is equal to (!—; "er art . Consequently, the overall 


nn ie 
training éfrectivensss is expressed by the equation 


roazl—e ss, 
If the nunber of neurons of each pattern is equal to 7n(n + 1), 


i.e., exceeds by approximately a factor of 7 the total number of re- 
ceptors, then the probability of incorrect operation of a perceptron 
randamly selected from the class C after training by display of all 
the images one time each will be equal to ew! , which is equal to about 
0.001, 

As mentioned above, the construction of the theory of perceptron 
learning indicates the basic differences of the learning process real- 
ized by it from the actual learning process of the human brain. Chang- 
ing from the discrete neurons to the continuous, or replacing the a- 
law encouragement by f= or y-law does not significantly alter this sit- 
uation. The situation may be rectified partially by the addition to 
the processes realized in the perceptron of reconnection of the neurons 
which interfere with or do not significantly aid the learning process. 

We can provide for, for example, peroidic verification of the 
weights of the neurons and random connections of the neurons with 
smaller weight. Mechanisms of this sort are realized in Roberts! adapt 
[67] and the Selfridge pandemonium [72]. They increase the equipment 
utilization coefficient and reduce the number of neurons, which in the 


perceptron schemes with purely random connections reach tremendously 
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large values. 

However, this is completely inadequate for clarification of such 
a feature of the adaptive functions of the brain as the use of partic- 
ular features distinguished on patterns already studied for the accele- 
ration of the process of learning to recognize new patterns containing 
all or part of these features. It is easy to see that such a process 
can be realized in the multi-stage perceptrons, i.e., in those circuits 
in which the pattern summators of the perceptron of the lower stage 
are used as the repectors for the perceptron of the following stage. 
Here the perceptrons of the lower stages are taught to recognize indi- 
vidual properties of the patterns and the perceptrons of the higher 
stages are trained to recognize the ensembles of thes properties. Cor- 
responding alterations and complications of the laws of encouragement 
can be accomplished in many different ways. We note that the scheme 
which essentially includes the idea of the two-stage perceptron is used 
in the algorithm for teaching the r-:ognition of geometric figures de- 
scribed in the work of Glushkov, Kovalevskiy and Rybak [29]. 

Introduction of these improvements still does not permit approach- 
ing the simulation of another important characteristic of the brain, 
that is, the establishment of the invariance of all the patterns with 
respect to their movement and to change of dimensions on the basis of 
a limited experience, using only a small part of all the patterns. To 
achieve any success in this direction we must alter not only the con- 
struction of the perceptron but also the very methodology of the learn- 
ing process. To do this we introduce the possibility of the recogniz— 
ing device itself participating in the organization of the learning 
sequence. 

If, for example, the recognizing device A is shown as represent— 
atives of a particular pattern several different images, then the de—- 
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vice A must have the possibility of repeating the demonstration of 
these images as many times as necessary to ensure their correct recog- 
nition in the future. Moreover, the device must have the possibility 
of repeating the display of those same images subjected to those vari- 
ations which the image of an object on the retina of the eye is usually 
subject to with changes of the relative position of the eye and the ob- 
ject being considered. 

We can, of course do things other than introduce the described 
feedback which permits the recognizing device to alter the learning se- 
quence. In place of this, the recognizing devices themeselves can be 
constructed so that after the display of a particular image there is 
an increase of the probability of the display at the following step of 
the same image, viewed, perhaps, at a different angle, or at least 
images belonging to the same pattern. In other words, in the training 
of the recognizing devices we must avoid the construction of the learn- 
ing process using the scheme of independent trials and go to the more 
complex schemes described by the Markov chains. 

The suggested variations of the methods of construction of the 
learning sequences considerably improve the functioning of the recog- 
nition devices in the simple learning regime. However, it is in the 
self-learning regime that these variations are of principal importance, 
Since it is only in this direction that we can hope that the classifi- 
cation of images performed by the self-training devices will correspond 
to the original classification performed by a human. It is clear that 
the description of processes of this sort requires far more complex 
mathematical apparatus than that which has been used in the present 


section. 
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§7. ey OF THE DISCRETE a=-PERCEPTRONS IN THE SELF-LEARNING RE- 
G 


In the preceding section we studied the behavior of the discrete 
a=perceptrons in the learning regime. The characteristic feature of the 
learning regime is the presence of the teacher, who knows the correct 
classification of the images. In the present section we shall study 
some questions associated with the behavior of the discrete a-percep- 
trons in the self-learning regime. In this case the teacher is missing, 
and the processes of self-organization which lead to the alteration of 
the image classification performed by the perceptron are determined by 
the positive feedback introduced into the perceptron circuit. 

It is well known that the analysis of the behavior of the percep- 
trons in the self-learning regime which was made by Rosenblatt [69] is 
very far from being mathematically rigorous. The absence of rigoro sly 
proved propositions in this field leads some authors to ascribe to per- 
ceptron self-learning (particularly in publications of a popular sci- 
ence nature) many properties which in actuality it does not possess 
and cannot possess. On the basis of the considerations of the present 
section, it is not difficult to draw several conclusions which outline 
the boundaries of the actual possibilities inherent in the self-learn- 
ing of the perceptrons. 

Let us consider the discrete a-perceptron designed for the recog- 
nition of the two patterns P and Q. As the single output signal of the 
perceptron we shall consider the difference of the signals of the sum- 
mators of the Pth and Qth patterns 

V(t) = UP (1) —U9(W). (99) 

Here, just as in the equations of the preceding section, the in- 

dex i runs through all the images of the (both Pth and Qt h) patterns, 


#18 any sequence of images shown to the perceptron in the process of 


OO) a 











ir 


its self-training. 

Just as in the case of training, it is not difficult to show that 
the functioning of the symmetric perceptron in the self-training regime 
is determined by the sum a + b of the encouragement and penalty con- 
stants, and not by these constants considered separately. Having in 
view also the possibility of arbitrarily varying the scales, it is per— 
missible, without losing generality, to assume that a = 1, and b= 0. 
In the future we shall always make this assumption. . 

Using UP, ll to denote the characteristic matrix of the perceptron 
and recalling the definition of a-law encouragement, we easily obtain 


the equation 
V(t) = V, (l) +7, sign V, (0. (100) 

Here the symbols £j denote the image sequence ¢ to which there is 
appended the image j. 

Equation (100) is valid for any pair of images 1, Jj and for any 
image sequence g. The function sign x, as usual, is taken equal to 
plus 1 for positive values of x and equal to minus 1 for negative val- 
ues of x. It is clear that in the case of a zero value of the quantity 
V,(4), from the exact meaning of the encouragement law (positive feed- 
back) the quantity sign v,(4) in equation (100) must be undefined. In 
order to avoid indefiniteness, in the future we shall, by definition, 
consider zero to be a positive quantity, so that sign 0 = + 1. 

Keeping in mind the indicated modification in the definition of 
the function sign x, we shall consider equation (100) as a method of 
recurrent specification of the vector V(4) = (v, (2), Vo( 4), wees Vi 
(4)), which defines the output signals of the perceptron under the ao- 
tion of any image J =1, 2, ..., m after the application to the per- 
ceptron input of the image sequence g. The initial value of this vector 
v(0) = (V, (0), Vp (0), .-+» Vn) (0)) 18 assumed given. The image is as- 
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sociated by the perceptron with the P pattern or the Q pattern in ace 
cordance with whether or not the corresponding component V,(2) of the 
vector being considered is positive or negative (we recall that zero, 
according to the accepted agreement, is considered to be a positive 
number). 

Since all the quantities Th are integral (and also nonegative) 
numbers, the problem of the design of the perceptron in the self- 
training regime reduces in essence to the problem of a random walk over 
a discrete lattice in space with the number of measurements <m. As- 
suming that the display of the images in the self-learning process is 
performed following the scheme of independent trials, it is easy to 
note that the probabilities of the transitions from any point of such 
a lattice are determined only by the ensemble of signs of the coordi- 
nates of this point. 

It is not difficult to see that the set of signs of the corrdi- 
nates of any point of the lattice also defines the image classifica- 
tion performed by the perceptron which has as the vector of its output 
signals the radius-vector of this point. From the point of view cf per- 
ceptron theory, of prime interest is the limit distribution of the 
signs of the coordinates of the vector V(4) with unlimited increase of 
the length of the training sequence g. The analysis made above shows 
that the required distribution is obtained fron the limit distribution 
for the Markov chain corresponding to the walk over the discrete lat- 
tice described above. 

Since this chain has an infinite number of states, finding the 
distribution limit in the general case is quite complex. We can, how- 
ever, note several cases when finding the limit iistribution is easily 
reduced to the study of the Markov chain with a finite number of states 

Let us consider as an example the discrete symmetric a-perceptron 
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A designed for the recognition of 2n images, the first n of which be-= 
long to the pattern P, and the last n to the pattern Q. Assume further 
that for the elements of the characteristic matrix of the perceptron 

A the relation Ty; =a >0O, holds if i and j belong to the same pat- 
tern and Tj = 0, holds if i and j belong to different patterns. In 
view of theorem 4 from §6 of the present chapver, the perceptron being 
considered has absolute capacity for extrapolation and, consequently, 
behaves itself best in the learning regime (learns the correct recog- 
nition as a result of showing at least one image of each pattern). Let 
us assume that the initial conditions will be the conditions V, (0) = 
=b (b>0O) for i=1, 2, ..., m (m ¢n) and v,(0) =—b for j=m +1, 
M © 2; «085 Ry Fey “ON 

From equation (100) it follows directly that V (4) < O for any se- 
quence gwith j=n+1, n+ 2, ..., On. The remaining components will 
be expressed by the equations V, (4) =b+kaforie-=1, 2, ..., m and 
by V,(4) =- b+ ka fori=sm+i1,m+2,..., n, where k is the dif- 
ference between the number of appearances of the images corresponding 
to the positive components V,(2') and the number of images correspond- 
ing to the negative components Vv, (2') (g' 1s the corresponding subse- 
quence of the sequence 4). 

Let us assume that the self-training process is accomplished us- 
ing the scheme of independent trials with different probabilities of 
the appearance of all the images. Since the display of the images of 
one pattern in the case considered has no effect on the recognition of 
the images of the second pattern, we can without losing generality as-—- 
sume that in the self-training process there participate only the © 
images of the pattern P (the images of the pattern Q always correspond 
to a negative output signal regardless of whether they are included in 
the self-training process or not). 
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Let us assume that b does not contain a, and use t to denote the 
whole number [a/b] + 1. It is not difficult to see that for the study 
of the functioning of the perceptron A only those values of the para- 
meter k are of interset which are included in the closed interval 
[—t, t]. Actually, if in the self-training process the quantity k 
reaches the value t even one time, then in the future, as a result of 
equation (100), it can only increase, and the perceptron, beginning 
with that moment, will deliver a positve output signal for all images 
of the pattern (which corresponds to correct classification). Similarly, 
if the parameter k takes the value —-t, the perception will deliver a 
negative output signal for all images (which actually means the absence 
of any image classification, since all the images are associated by the 
perceptron to the same pattern). 

Now, as is easily seen, the limit behavior of the perceptron A is 
determined by the Markov chain with 2t + 1 states k =—t, -—t+l,..., 
soey ~ 1, O, 1, «2.2, tml, t. Inview of the assumption made on th vro- 
babilities of the appearance of the images in the process of the self- 
training, for any k differing from t or —t the probability of transi- 
tion into the state k + 1 is equal to m/n, and the probability of tran- 
sition into the state k — 1 is equal to n—m/n. From the state t (just 
as from the state —t) transition is possible only into the same state, 
since from the point of view of the functioning of the perceptron all 
states with k > t (correspondingly — with k <—t) do not differ from 
the state k = t (correspondingly — from the state k=t). 

Introducing the notations p = m/n and q = n-—m/n, we obtain for 


the considered Markov chain the matrix of the transition probabilities 
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}000..,000 
pogd...000 
OpO0qg...000 
Meee oe ee 
000C...09¢0 
0000 .p0a@ 
0000.. .00! 


This matrix has unity as its double characteristic root. The proba- 
bilities of the transition of the chain into the states t and -t are 
oO 60 
equal to the limit tranistion probabilities Pe+1,1 and Pt4, 2t+1° For 
the probability Peed 4 we obtain from the Perron equation 
e 


d 4M, 14, () 
Pras = lim poor Yl wa é (101) 


Dent 

It is easy to see that My tel (A) for 1 differing from 1 and from 
2t + 1 contains (; — 1)° and therefore for all values of 1 pf,, 4 = 0. 
For 1 =1 My t+1 (A) = (A- 1) M, (2) and for 1 = 2t +1 Motel, ttl (A) = 
= (A— 1) My(A), where M,(2) = p°Q(a), Mp(X) = q°R(2). 

From equation (101) we easily obtain Pet, t = ep’, Pe+1, 2te1 = cq’. 

Since all the remaining limit transition probabilities in the 
(t + 1)th row are equal to zero, from the conditions of stochasticity 
of the mateix of the limit transition probabilities we find the value 
of c: ¢c = 1/p Cag®, Thereby we have proved the following proposition. 

With unlimited continuation of the self-training process the per- 


© establishes 


ceptron A described above with the probability o°/p*+q 
the correct classification of the images and with the probability 
q’/p +q° relates all the images to the same pattern. 

The considered example, as the attentive reader can easily note, 
strictly speaking cannot be performed in a real perceptron, except for 
the trivial cases m=n, p=1, q=0Oandm=0, p=0, q=1. The 


reason is that with the assumptions made relative .o the characteristic 


matrix all the images of the same pattern stimulate the same set of 
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neurons. Therefore, the output signals induced by the images of the 
same pattern always must be eq :1 to one another, including at the ini- 
tial moment. 

It is not difficult, however, to note that by setting Ts 4 = at+6 
for all 1 =1, 2, ..., 2n (6 > 0), we obtain the possibility of sat- 
isfying the initial conditions introduced in the example. Moreover, if 
6 is significantly smaller than a, and t is relatively large, then the 
perceptron behavior described in the example can serve as a good ap- 
proximation for its real behavior. 

Let us consider the complete discrete a-perceptron B with (1, 1, 1) 
-—neurons, with a square (n x n)-retina, designed for the recognition 
of the two patterns P and Q. The pattern P consists of n horizontal 
lines, and the pattern Q and n vertical lines. Each of these lines con- 
stitutes an individual image. In the preceding section it was noted 
that the perceptron B ..uld be considered as the most characteristic 
representative of the class of perceptrons with random connections of 
the neurons with the retina. According to theorem 1 and the corollary 
following it from the work of Rosenblatt [69], such perceptrons con- 
structed using continuous neurons with self-learning must tend to a 
state in which all the images are related to the same pattern with a 
probability arbitrarily close to unity. Let us show that this state- 
ment is not valid for the perceptron B. 

It is easy to see that we can select any initial conditions for 
the perceptron B. We shall term the smallest of the numbers |V (0)| 
(1 = 1, 2, ..., 2n) the lower boundary of the moduli of the initial 
conditions. With the assumptions made, the following theorem is valid. 

Theorem 1. For any arbitrarily small positive number ¢e there is a 
number S such that in the case when the lower boundary of the modull 
of the initial conditions exceeds S the perceptron B in the self-train- 
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ing regime (with equiprobability of appearance of all the images) re- 
tains the intial classification of the images with the probability 
p>l-e. 

Proof. We use N to denote the length of the training sequence 4 
and v, to denote the number of appearances of the ith image (1 = 1, 

2, ..e, 2n) in this sequence. Let V, (0) =X, (1 = 1, 2, «.., 2n); Ky 
1s the set of all indices j (images) relating to the pattern opposite 
in comparison with 1 and such that the sign of X 4 coincides with the 
sign of X13 2, is the set of all indices j relating to the pattern 
which is opposite to i and such that the sign of X 4 is opposite to the 
sign of Xy (as before, zero is here considered to be a positive nun- 
ber). 

As was shown inthe preceding section, the arbitrary element Ty; 
of the characteristic matrix of the perceptoon B is equal to n*(n = a5 
O or (n— 1)* depending on whether the indices i and j coincide or do 
not coincide but relate to the same pattern, or do not coincide and 
relate to different patterns. Using this circumstance, with the aid of 
equation (100) we easily learn that the original classification of the 
images is retained in the self-training process if for allN=1, 2,... 
the following inequalities are satisfied 


n(n— lv, + (a—1P(Sv,—-D vy) +4 1>0 @=12....2n), 


lek, lez, 
and even more so if 
ni(n—l)v,—(n—IP ¥ v,+2>0 (= 1,2,..., 2n), (102) 
ferjVey 


where x is the minimal of the numbers Ix, 1(4 = 4, 2, «ey On). In turn 
it is not difficult to verify that inequalities (102) are satisfied if 


the inequalities 








YI ! 2x (103 ) 
W al<a(! t wen) (i = 1,2, , 2n) 
-~ 298 ~ 








are satisfied. 
With designation of the quantity 1/4n® by the letter 6 it is evi- 
dent that the inequalities (103) will be obviously satisfied if the 


inequalities 


PS 





<6 (6 =1,2,...,2n). (204) 


are satisfied. 
As the result of theorem 4 of §3 of the present chapter, there 
exist the positive constants a and b, not depending on N, such that 


the probability Py of nonsatisfaction of at least one of the inequal- 


ities (104) with any fixed value of N has the estimate R(M) = —*— Ss 
M72 ' 
The probability of nonsatisfaction of at least one of inequalities 


(104) for values of N from M to » does not exceed the sum of the series 
a =—)N' a —oNn 


—7* ,, and this sum is clearly less than Pw<x———e . With 
v3 Ne-> 


M—> © the quantity R(M) vanishes. Let us take M so that R(M) <e. 





Now taking S = 2(M—1)n°(n— 1) we find that with x > S the in- 
equalities (103) are satisfied foe all values of N= 1, 2, ..., M—1. 
As a result of the choice of M, for all the remaining values of N the 
inequalities (103) are satisfied with a probability greater than 1 — ¢. 
Since satisfaction of the inequalities (104) for all values of N from 
1 to <'‘~ means retention of the original classification of the images, 
the theorem is proved. 

Theorem 1 shows that with sufficiently large initial values of 
the output signals for all the images the considered perceptron actu- 
ally is practically devoid of capability not only for self-training, 
but even to simply change the image classification initially specified 
to it. Fromthe proof of the theorem it is easy to see that the remain- 
ing weak capability for self-alteration has its maximal value in the 


case of the correct initial classification. In other words, the per- 


=- 299 - 


—_— 








ceptron has the least tendency to retain the correct method of func- 
tioning. 

Let us consider f—law encouragement. To do this let us fix the ar- 
bitrary number 6 included between zero and unity, and let us consider 
the arbitrary symmetric perceptron C with B-law encouragement whose 
characteristic matrix is diagonal, 1.e., in other words, has nonzero 
elements only on the principal diagonal. As shown in the preceding 
section, this property is possessed by the summetric perceptron C, with 
(1, 1, 1)-neurons which is designed for the recognition of horizontal 
and vertical lines and in which the inputs of each neuron are connected 
to the elements of the retina located on one horizontal or on one ver— 
tical. 

In the case of B-law encouragement the basic recurrent relation 
for the determination of the output signals is written 

V(t) = (1 —B)V,) + T,sign V, (0). (105) 

The notations here are exactly the same as in equation (100) and 
this relation is also valid for any discrete symmetric perceptrons. In 
the case of perceptrons with diagonal characteristic matrix both terms 
in the right side of equations (100) and (105) always have the same 
sign (the case when the second term is equal to zero is excluded from 
consideration). Whence follows directly the validity of the following 
proposition. 

Theorem 2. The discrete symmetric perceptron C with diagonal char- 
acteristic matrix is completely devoid of capability for self-learn- 
ing (1.e., retains unchanged any given original image classification) 
both in the case of a-law encouragement, and in the case of B-law en- 
couragement. 


It is also easy to see the validity of the following proposition. 
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Theorem 3. No discrete symmetric perceptron (with either a- or 6- 
law encouragement) operating in the self-training regime can alter the 
original image classification if this classification relates all the 
images to the same pattern. 

Our resuits can be considered as counterexamples to the results 
of Rosenblatt [69], to the degree that his discussions relate not only 
to the continuous but also to the discrete neurons. In any case these 
results indicate that the asymptotic behavior of the perceptrons in the 
self-training regime is far more complex and requires ccnsiderably 
more precise techniques for its study in comparison with the techniques 
of purely qualitative nature used by Rosenblatt [69]. 

For a visual representation picture of the peculiarities of the 
behavior of the perceptrons in the self-learning regime in comparison 
with the learning regime, let us consider the case when the number of 
images is equal to two (each pattern consists of one single image). 
This case permits simple graphical interpretation. 

First we note that in the case of the presence of two patterns 
(but with an arbitrary number of images) the functioning of the per- 
ceptron in both the learning and the self-learning regimes is conven- 
1ently characterized by a vector with the components V, (2) (see above). 
The basic recurrence relation for these components will obviously have 
the form 

Vi) = VO) Ty. (106) 

This relation is valid for any pair of images i,j and for any 
training sequence g. The second term in the right side is taken with 
the plus sign if the image j in the correct classification relates to 
the positive output signal, and with the minus sign if the correspond- 


ing output signal must be negative. 


- 301 - 


























Let us consider the discrete symmetric 
perceptron with a-law encouragement whose 


characteristic matrix is the matrix r-.|¢ | P 





where a > b > O. Let us assume that with cor- 
Fig. 13 rect classification the firs image must in- 
duce a positive output signal, and the sec- 

ond image — a negative output signal. Plotting the coordinate Vv, (2) 
along the horizontal axis, and the coordinate Vo (2) along the vertical 
axis, we associate with each vector (V, (2); Vo (£)) some point of the 
plane. Selecting one point in each quadrant, 
we obtain a visual impression of the action 
of equation (106) (Fig. 13). 

In Fig. 13 the letter T; denotes the vec- 
tor (a, b) and the letter T, denotes the vec- 





tor (b, a); the characteristic feature of the 


Fig. 14 


training regime is that the directions of the 
vectors (defining the random walks of the point on the lattice) do not 
depend on the position of the points on the plane. The resultant of 
these vectors is always directed in the direction of that quadrant in 
which the signs of the coordinates (output signals of the perceptron) 
coincide with the correct classification (in the present case this 
quadrant is the hatched — fourth quadrant). 

For the case of the self=training regime the interpretation of the 
corresponding equation (100) is shown in Fig. 14. In contrast with the 
previous case, the directions of the vectors which define the random 
walks are different in the different quadrants. The designations of the 
vectors are the same as in Fig. 13. 

It 1s not difficult to note the qualitative differences of the 
situation shown in Fig. 14 from the situation shown in Fig. 13. First 
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of all, the first and third quadrants (shown shaded in Fig. 14) now 
possess a trapping property: a point which falls into one of these 
quadrants in the process of the random walk can never excape from it. 

Entrance into these quadrants means actually the absence of any 
classification (both images are assigned to the same pattern). More- 
over, from the quadrant corresponding to correct classification with 
accuracy to the sign of the output signal (third quadrant) there is al- 
ways a zero probability of exit into the nelghboring quadrants. 

Considering the resulting situation in the purely qualitative as- 
pect, similar to the approach of Rosenblatt [69], we would have to come 
to the conclusion that the perceptron which we are studying tends as- 
ymptotically to a state in which output signals of the same sign (ab- 
sence of any classification) are generated for all images. A more 
thorough consideration (repeating the analysis made in the proof of 
theorem 1) leads, however, to a completely different conclusion: just 
as in the case of theorem 1, with sufficient removal of the initial 
point from the boundaries of the quadrant the probability of contin- 
uation of the random walk without leaving this quadrant in all the suc- 
eceding instants of time (clear up to infinity) can be made arbitrarily 
close to unity. 

This once again underscores the danger arising in the case when 
general conclusions on the asymptotic behavior of the perceptrons are 
based on arguments of purely qualitative character, without confirming 
them by exact computations and estimations. 

§8. LOGICAL CLASSIFICATION SYSTEMS AND CONDITIONAL PROBABILITY MACHINES 

The systems for pattern recognition considered in the preceding 
sections are devices for the classification of certain subsets in the 
image space. Directing ourselves to the visual, audible and other pat- 


terns which are continuous in nature, we to a certain degree trans- 
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ferred the property of continuity to the corresponding classification 
systems. Actually, even in the case of clearly discrete receptors the 
hypothesis of the N-extrapolatability of the patterns, which presumes 
continuity of the patterns, permitted selection for classification only 
of those sets which were in the corresponding sense "well arranged." 
This same implicit use of the property of the continuity of the patterns 
is also present in the perceptrons (including the discrete) and also 

in all the other algorithms and devices for the recognition of patterns 
mentioned in the preceding sections. 

The limitation of the number of image space subsets which are sub- 
ject to consideration and classification in the case of the visual, 
audible and other patterns which are of a continuous nature is of prime 
importance, since without such limitation the recognition problem would 
be practically unsolvable for these patterns. 

Actually, in the case when the retina consists of n binary recep- 
tors the image space consists of I(n) = 2" different images and con- 
tains Q(n) = gen aifferent subsets — possible discrete patterns. With 
n= 5 the second of these quantities already exceeds four billion, and 
with a relatively small number of receptors such as 100 the first quan- 
tity is expressed by one with thirty zeros. 

In view of these discussions it becomes evident that the problem 
of the construction of devices which store (or generate) the features 
of all possible patterns for any large values of n is practically un- 
solvable. However, in the case when the number of (binary) receptors 
does not exceed ten or fifteen it is in practice quite possible to con- 
struct a machine capable of remembering and performing various opera- 
tions with all the images (but not with all the patterns) which can be 
reproduced with the aid of the corresponding retina. 

The machines which classify all the possible images which can be 
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obtained from binary receptors will be termed logical classification 
machines. We shall describe one of the possible schemes of such ma= 
chines proposed by Attlee [2]. 

For each property of the image the Attlec classification machine 
contains the so-called discriminative element which is stimulated un- 
der the action of this property. Here and hereafter, by image property 
we shall mean the presence in some set M of receptors (depending on 
the choice of the corresponding property) of a definite combination 
of output signals (zeros and ones). Here the receptors which do not ap- 
pear in the set M can have any output signals. The property that the 
receptors with the numbers ij; 15, aay i have unity output signal 
and the receptors with the numbers Jy Jo» i<ey Jn have a zero output 
signal will be designated by (1,, doy sees Jy? Jo» Seare dee 

From these definitions it becomes clear that the specification of 
a property is equivalent to the assignment to the retina receptor out- 
put signal of one of three values: one, zero, indifferent. If we de- 
note the total number of receptors composing the retina by N, then it 
is easy to see that with a total number of images equal to oN the 
number of their different properties will be equal to aN All the prop- 
erties of any given image can be obtained with the aid of repiacement 
of some number of the signals (zeros and ones) composing this image by 
the indifferent signals. The number of such replacements (and this 
means the number of properties of each image) will clearly be equal to 
the sum oy + obs ae re ee ce = oN. Thus, each image causes the stimula- 
tion of oN elements of the classification machine. 

Let us term theproperty of the image to stimulate the ith recep- 
tor the ith elementary property, and any combination of elementary 
properties we shall term a positive property. With N binary receptors 
there are in all only N elementary properties. It is also clear that 
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the specification of any positive property is equivalent to the spec- 
ification of some subset of receptors which induce unit output signals. 
The total number of positive properties is thus equal to the number of 
subsets of the set of N elements, i.e., oN 

Attlee terms the classification machine just described which is 
able to differentiate any image properties, the binary classification 
machine, in contrast with the so-called unitary classification machine, 
which is capable of differentiating only positive properties. It is 
easy to see that for every binary machine there exists its equivalent 
with respect to the classification being performed) unitary machine 
containing twice the number of receptors. We need only add to each re- 
ceptor which reacts to a particular elementary property another recep-— 
tor which reacts to the absence of this property. Although at first 
glance it appears that after this we need yh discriminative elements, 
in actuality many of them are redundant since they will never be stim- 
ulated. After removal of the redundant elements the number of remain- 
ing elements will be exactly the same as in the case of the binary ma- 
chine, l.e., 3N. The simplicity of the reduction of the binary machines 
to unitary machine makes it possible for us to limit ourselves in the 
future to the consideration of only the unitary machines. 

It is convenient to picture the discriminative elements of the 
unitary classification machine with N receptors in the form of neurons 
having from one to N input channels and capable of being stimulated 
only in the case of simultaneous stimulation of all their input chan- 
nels. Each of the neuron input channels is connected to the output 
channel of some receptor. Neurons with inputs connected to the r2cep- 
tors with the numbers i); i, cease dis will correspond to the positive 
property (14, dos sees 1.) and will be stimulated only with the pre-~ 
sence of this property in the image being recognized. In order that 
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the total number of neurons be equal to exactly oN it is necessary to 


assume the presence of still another neuron without input channels 
which is stimulated constantly regardless of the image being recognized. 
This neuron corresponds to the property which is the combination of the 
empty set of the elementary properties. 

Let us consider the arbitrary receptor i and all the positive 
properties containing the elementary property i. Among these propertles 


4 


there is exactly one property containing one elementary property (in 
the present case this will be the property 1 itself), exactly ae é 

= N— 1 properties containing two elementary properties each (all prop- 
erties of the form (1, j), where j # 1), exactly Ge properties con- 
taining three elementary properties each, etc. In the wnitary machine 
one neuron corresponds to each of the positive properties. The total 
number of neurons connected to the receptor i is expressed by the sum 


1 2 N-1 _ ,N-1 


1+ Cy_y + Cyiy tee + Cig » which amounts to exactly half of 


all the neurons in the unitary machine. 

Making a random selection of the neurons with the condition that 
all the neurons are considered equiprobable, we come to the conclusion: 
the probability that the neuron thus selected will be connected to 
any given receptor 1 is equal to 1/2. Thus, a connection scheme which 
is to a certain degree close to the scheme of the unitary classifica- 
tion machine can be obtained as the result of the random connect ign of 
the neurons to the receptor with equal probability of connection or 
nonconnection of the input channels of a given neuron to a given re- 
ceptor. 

In the destribed scheme of the classification machine (either bi- 
anry or unitary) there is complete absence of any elements of self- 
organization or self-improvement. Therefore, following Attlee, we in- 
troduce changes and additions into the scheme of the unitary machine, 
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after which this machine is converted into the so-called conditional 
probability machine. 

For simplification of the notations we shall denote any positive 
properties of the images by capital Latin letters. If I and J are pos- 
itive properties, then we use; yy to denote the union of these prop- 
erties, i.e., the positive property consisting of all the elementary 
properties occurring either in the property I or in the property J, or 
in both of these properties at the same time. We use /nv to denote 
the intersection of the properties I and J, i.e., in other words, the 
positive property consisting of all those and only those elementary 
properties which occur simultaneously in both the property I and in 
the property J. 

Let us now assume that to the input of some unitary machine there 
is applied some training sequence, i1.e., simply speaking, some sequence 
of images. Generally speaking, not all the terms of this sequence pos- 
sess the fixed property I. Therefore the neuron corresponding to the 
property I is stimulated by some terms of the training sequence and 
not by other terms. The ratio of the number of terms cf the training 
sequence possessing the voroperty I and, consequently, stimulating the 
indicated neuron, to the total number of terms of this sequence is nat-— 
urally termed the property frequency for the given training sequence 
(which we shall also term the training history). For clearer differ- 
entiation from the conditioned frequency which is introduced later, we 
customarily term thefrequency just defined the unconditioned frequency. 

We designate the unconditioned frequency of the property I by 
p(I); here the training history is assumed to be fixed. 

Let us impose on the neurons of the unitary classification ma- 
chine the additional function of computing the unconditioned frequency 
of the appearance of the properties corresponding to them. If the image 
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being applied to the machine input possesses some property I, then the 
neuron corresponding to this property, after calculating and memoriz- 
ing the unconditioned frequency of the property I, delivers at the 
given moment the output signal equal to one. If, however, this neuron 
is not stimulated (1.e., if the current image does not possess the 
property I), then its output signal will be the value of the uncondi- 
tioned frequency of the property I which is stored in the neuron.. 

With the indicated additions and alterarions the unitary machine 
now takes on certain features which are characteristic, if not of the 
self-organizing automata, in any case, of the self-adaptive automata. 
Further improvement involves the computation of the so-called condi- 
tioned frequencies of the properties being classified by the unitary 
machine. 

The conditioned frequency p(I/J) of the property I with relation 
to the property J is the ratio of the number of cases of joint appear- 
ance of the properties I and J (i.e., in other words, the appearance 
of the property /yJ) to the total number of cases of appearance of the 
property J 


pilus) 
p (ld) = a (107) 


We shall as before consider that the neurons of the unitary ma- 
chine compute and remember the unconditioned frequencies of the prop- 
erties corresponding to them. Just as before, in the case of direct 
stimulation of a neuron (1i.e., in the case of the presence in the cur- 
rent image of the property corresponding to this neuron) the neuron 
will deliver a signal equal to one. However, all the neurons which are 
not subjected to direct stimulation must now deliver not the uncondi- 
tioned, but theconditioned frequencies of the properties corresponding 
to them. The only question is: relative to what property are the indi- 
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cated conditioned frequencies *o be calculated. It is easy to see that 
as the property J it is most nr‘.:ral to select the maximal positive 
property of the image being consije:ed, which is the union of all the 
elementary properties which the giver ‘mage possesses. Actually, only 
the very maximal property completely determines the image corresponaing 
to tt, so that the conditioned frequencies will be actually referred to 
the frequency of the appearance of this image. 


Let us introduce the concept of subordination for the neurons of 
the unitary machine. We say that the neuron A is subordinate to the 
neuron B if the property J corresponding to the neuron B includes in 
itself all the elementary properties from the property I corresponding 
to the neuron A. 





The neuron Q, corresponding to the maximal positive property of 
some fixed image, is obviously characterized by the subordination to 
it of all the neurons which are directly stimulated by this image, and 
it is not subordinate to any of these neurons, except, or course, it- 
self. All the neurons to which the neuron, Q@ is subordinate constitute 
the so-called superset M(Q) of this neuron. In the case being consid- 
ered none of the neurons P from the set M(Q), except Q itself, is di- 
rectly stimulated and therefore must deliver a signal equal to the con- 
ditioned frequency p(I/J) of the property I, corresponding to the neu- 
ron P, relative to the property J, corresponding to the neuron Q. From 
the definition of the superset it follows directly that ly =/. 
Therefore, as the result of equation (107), 


pili) -f (108) 


Thus, for all the neurons from the superset M(Q) of the neuron Q 
the output signals can be détermined from the equation (108): to ob- 
tain the output signal of the neuron P from M(Q) the value stored in 
it of the unconditioned frequency of the property corresponding to it 
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must be diveded by the value fo the unconditioned frequency stored in 
the neuron Q. Attlee terms this operation for obtaining the output sig- 
nals of the neurons from the superset M(Q) the supercontrol operation. 
The superconteol operation does not lead to contradiction even for the 
neuron Q itself, since in this case the output signal computed using 
equation (108) will obviously be equal to unity, which agrees with the 
known value of the output signal of the neuron Q obtained from the con- 
dition of the direct stimulation of this neuron. 

The set of all the neurons subordinate to any given neuron P will 
be termed the subset of this neuron and will be denoted by L(P). Us- 
ing the concept of the subset, it is not difficult to determine the 
method of obtaining the output signals for all the neurons which are 
not subjected at the given moment to direct stimulation and do not en- 
ter into the superset of the neuron Q corresponding to the maximal pos-— 
itive property of the image being considered in the given step. 

Let us denote the set of all such neurons by K, and let R be any 
neuron from K. If as before J denotes the property corresponding to the 
neuron Q, I the property corresponding to the neuron R, then the neu- 
ron P which corresponds to the property JUJ.will obviously belong to 
the superset M(Q). 

As the result of equations (107) and (108), the output signal of 
the neuron R is equal to the output signal of the neuron P. Moreover, 
it is clear that the neuron R occurs in the subset L(P) of the neuron 
P. This suggests the conclusion that all the output signals not so far 

determined (for the neurons of the set K) can be obtained as the re- 
sult of simple transfer of the output signals of the neurons from the 
auperset M(Q) to the neurons of the subsets corresponding to them. It 
is natural to term this transfer of the output signals to the subsets, 


by analogy with supercontrol, the subcontrol operation. 
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However, the subcontrol operation defined in this way does not 
lead to a unique determination of the output signals, since neuron R 
from K appears in not one, but, generally speaking, several subsets of 
the various neurons from M(Q). For the elimination of the resulting am- 
biguity we note that among the subsets H of all the neurons from M(Q), 
to which the neuron is subordinate, the neuron P of interset to us 
(corresponding to the union of the properties I and J, as they were 
defined above) will be subordinate to all the remaining neurons of this 
subset. It is easy to see that the property Jud will in this case have 
the highest unconditioned frequency among the properties correspond- 
ing to all the neurons from H. As a result of equation (108) this means 
that the output signal of the neuron P is the highest among the output 
signals of all the neurons from the subset H. 

Thus, to ensure error-free functioning of the machine the subcon- 
trol operation must be supplemented by still another rule: if as the 
result of the subcontrol operation several different output signals are 
transferred to some neuron, the largest of them must be selected. 

Let us emphasize once again that the subcontrol operation is not 
applied to the neurons whose output signals are determined from the con- 
dition of direct srimulation or on the basis of the use of the super— 
control operation. 

The unitary machine with all the described additions and altera- 
tions in the neuron functioning laws is termed a conditional probabil- 
ity machine. This name emphasizes the fact that the conditioned fre- 
quencies computed by the machine with sufficiently long training his- 
tories, which we will assume are usually performed using the indepen- 
dent trials scheme, tend with arbitrarily high confidence level to the 
corresponding conditional probabilities of some properties with respect 
to the others. The conditional probability machine is used for the sim- 
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ulation of the processes of the development and decay of the so-called 
conditioned reflexes. Let us assume that throughout the entire train- 
ing history of the machine the property J appeared almost always to- 
gether with the other property I. In that case the conditioned frequency 
p(J/1) of the property J with respect to the property I will be close 
to unity. If now property I appears without property J, then the tp 
action (output signal) of the neuron Q, corresponding to the property 
J, will differ little in its intensity from the reaction of the na@v- 
ron P, which corresponds to the property I, and consequently is sub= 
ject to the direct stimulation from the direction of this property. We 
will say in this case that in the machine there was developed a condi- 
tioned reflex for the property J with relation to the property I. 

If after the development of the indicated reflex it is not rein- 
forced over the course of a sufficiently large number of steps in the 
succeeding training history (1i.e., the property I appears without thc 
property J) then the conditioned frequency p(J/I) will diminish and 
can in the course of time become a negligibly small quantity. With tiie 
next appearance of the property I without the property J the reaction 
of the neuron Q will be very slight. In this case we shall speak of 
the decay of the corresponding reflex. i 

These processes of the developmem and decay of the conditioned 
reflexes, at least from the purely superficial view, are quite sidilar 
to the analogous processes taking place in the living organisms, din 
particular in the human nervous system. At the same time there “— sev= 
eral differences. One of the essential differences is that in the 
scheme we have described the rate of development and decay of the con- 
ditioned reflexes in the very beginning of the training process is tco 
high and at the end of the training process this rate is too low. This 
situation can be rectified by replacement of the unconditioned and 
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conditioned frequencies by the so-called pseudofrequencies. 

The unconditioned pseudofrequency of any given property I is the 
quantity r, included between zero and one, which increases with the 
appearance of images with the property I and decreases with appearance 
of images not having the property I. The quantity r must tend to one 
if, beginning with some moment, all the terms of the training sequence 
have the property I, and must tend to zero if all the terms of the 
training sequence, beginning with some moment, do not possess the 
property I (we assume here that the training sequence can be comple- 
mented with new terms in the course of an arbitrarily long period of 
time). 

This definition is satisfied, in particular, by the unconditioned 
frequency, which can therefore be considered as one of the possible 
methods for the specification of the unconditioned pseudofrequency. 
However, it is simpler and more convenient to consider as the uncondi- 
tioned pseudofrequency some property of the quantity y= r(I), given 


by the recurrence relations 


r 


Pv Fubra (n= 0, 1,2,...), (109) 


r,— Br 
where the quantities a and : oe positive constants which are strictly 
less than unity. If at the (n + 1)th step of the training there appears 
the property I, then we use the upper line of the indicated relations, 
otherwise we use the lower line. The initial value lo of the quantity 
r, canbe selected arbitrarily on the interval (O54): 

If now in the conditional probability machine described above we 
replace the unconditioned frequency of the properties by the pseudofre- 
quencies caiculated with the aid of equation (109) and leave the neu- “a 
ron functioning method as before, then by selecting suitably the quan- 
tities a and B, we can approach much closer to the imitation of the bi- 
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ological processes of the development and decay of the conditioncd re= 
flexes than we can in the case of the original conditional probability 
machine. The conditioned frequencies of the prop: .ties will as hefor: 
be computed using equation (107), however, in place of theuncondition: a 
frequencies in the right side of this equation there will apnear the 
unconditioned pseudofrequencies. Therefore equation (107) will now 

give not the conditioned frequency, but some quantity which it Is nat- 
ural to term the conditioned pseudofrequency of one property with r-- 
lation to another. 

We can, moreover, by defining in a somewhat different way the con- 
cept of the conditioned pseudofrequency improve the conditional priha- 
bility machine so that it will immediately determine the conditioned 
the conditioned pseudofrequencies of the properties without prelimina:r. 
calculation and memorizing of their unconditioned frequencies or picu- 
dofrequencies. To do this we introduce into the unitary classificatim 
machine paired directed connections between the neurons. Each such eou- 
nection is assigned some weight, which can take any real valucs on th 
interval (0, 1). These weights can vary at every step of the traintur. 
We shall denote the weight of the connection between the neurons VP an! 
Q in the nth training step by Ay (Ps Q). We note that the weipht. 

(Ps Q) and A, (@, P) are not necessarily equal to one another. 

The law of variation of the connection weight is specified by the 

following relations, defined for all values of n = 0, l, 2,...: 
daar (P,Q) =A, (P.Q). 
if at the (n+ i)th training step the neuron P 


is not stimulated: 


dng (P,Q) = 4, (P,Q) +. a(1 —A,(P, Q)). 
(116) 
if at the (n + 1)th training step both neurons P and 


Q are stimulated; 
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if at the (n + 1)th training step neuron P is 
stimulated and neuron Q is not stimulated. 


(110) 


Here, just as in equation (109), a and B are positive constants 
which are strictly smaller than unity and the initial weight Ao(P, Q) 
can be chosen arbitrarily on the interval (0, 1). 

In the purely qualitative aspect the weight (Ps Q) behaves ex- 
actly like the conditioned frequency P, (J/Z) of the property J, cor- 
responding to the neuron Q, with respect to the propert I, correspond= 
ing to the neuron P. Actually, with simultaneous appearance of the 
properties I and J there is an increase of both the quantity d,(P; Q) 
and of the quantity p,(J/t). Both these quantities diminish with the 
appearance of the property I without the simultaneous appearance of 
the property J. If the first situation (simultaneous appearance of the 
properties I and J) repeats itself several times in a row, then the 
quantities Ay (Ps Q) and p, (J/Z) tend to unity. With numerous repeti- 
tions of the second situation (the appearance of I without J) both 
these quantities tend toward zero. Therefore it is natural to term the 
quantity r,(J/I) = An (Ps Q) the conditioned pseudofrequency of the 
property J with respect to the property I. 

If we say that in the conditional probability machine the neurons 
which are not directly stimulated must deliver as tneir output signals 
not the conditioned frequencies, but the conditioned pseudofrequencies 
of the properties corresponding to them with respect to the maximal 
positive property I of the image being considered at the given step, 
then for the accomplishement of this it is sufficient to transfer 
from the neuron P, corresponding to the property I, the output signals 
A, (Ps R) to all the neurons R which are not directly stimulated. Here 
it is convenient to consider that the neuron P sends along all the 
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connections of the form (P, R) its unit output signal, which is atten- 
uated along the corresponding connection as the result of multiplica- 
tion by the quantity A, (Ps R), which by the condition is always less 
than unity. 

It is not difficult to see that this mechanism for the formation 
of the conditioned reflexes has still another essential deficiency. 
This is that, as the result of the assumptions we have made, the condi- 
tioned reflexes are formed only with respect to the entire images and 
not to their individual properties. As is known, in the case of the 
biclogical systems the situation is different. Moreover, the reflexes 
are most frequently formed precisely with respect to the partial (not 
maximal) properties of the images. 

We can, it is true, in the conditional probability machine of ci- 
ther of the types described above fix once and for all the property I 
with relation to which the conditioned frequencies and pseudofrequen- 
cies are computed. In this case the property may not be the maximal 
positive property, and the conditioned reflex will be developed pre- 

‘ cisely with respect to the property of the: images and not to the image 
itself. All the definitions made above of the laws of the functioning 
of the conditional probability machine are aslo applicable to this case, 
only in this case there is no need to make a special search for the 
neuron corresponding to the maximal positive property of the image be- 
ing considered: in the case of the appearance of the property I the 
role of this neuron will always be played by the neuron corresponding 
to the property I. However, in the case of nonappearance of the prop- 
etry I all the neurons (with the exception of the neurons which are 
directly stimulated) must, by definition, deliver not the conditioned, 
but the unconditioned frequencies (or pseudofrequencies) of the prop- 


erties corresponding to them. 
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In his original definition Attlee considered precisely this meth= 
od of functioning of the conditional probability machine. However, in 
avoiding one deficiency we come upon another, and again obtain a sys- 
tem which is significantly different from the biological systems, which 
are capable of developing conditioned reflexes with respect to several 
properties, and not just to one of them. 

In order to avoid these last deficiencies, it is sufficient to re- 
move the limitation in the last of the schemes which we have described 
which permits formation of the output signals of the neurons which are 
not directly stimulated only from the output signal of the single neu- 
ron P, In place of this we make the following assumption. 

To every neuron Q which is not directly excited we transfer the 
signels An (Py: Q) from all the directly stimulated neurons Pj (1 een 
2, ...). The output signal of the neuron Q will be the signal from the 
number of signals Ay (Py Q) (14 = 1, 2, ...) which has the largest mag- 
nitude. 

The device which realizes this mechanism for the generation of 
the neuron output signals and for the alteration of the weights of the 

connections in accordance with equation (110) is termed a conditioned 
reflex machine. We can hope that it reflects many important properties 
of the real neural networks which constitute the neural systems of the 
animals and even the human neural system. 

In the real neuron networks, of course, there are not all the con- 
nections required by the complete circuit of the conditional reflex 
machine. Further improvement of the laws of the variation of the weights 
of theconnections is also possible. In particular, the first of equa- 
tions (110) should apparently be replaced by the equation rnp (Ps Q) = 
= (1- V)AA (Ps Q) where y is a very small positive constant. After this 
refinement the law of the variation of the weights of the connections 
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will reflect the property of the connections to diminish in the course 
of time even in the absence of cases of direct nonreinforcement of the 
refiex which is reflected by this connection. 

If in the conditioned reflex machine we drop the reguirement for 
completeness, 1.e., the presence of neurons for all positive proper- 
ties without exception, for all possible images without exception, 
then such incomplete conditioned reflex machines can be used for ope- 
ration with a large number of receptors. On this basis we can possibly 
use the conditioned reflex machines for the solution of the problems 
of the recognition of visual patterns which we have been considering 
in the preceding sections. To obtain effective results in this direc- 
tion it is necessary to make a purposeful selection of those properties 
to whicr the neurons in the indicated incomplete machine will corre- 
spond. 

Still another problem associated with the classification systems 
is of interest. In the classification systems described so far all the 
images are perceived simultaneously with the images themselves. In 
many cases, however, we must deal with images whose properties are 
manifested gradually, in the course of the training. Moreover, in these 
cases the images, as a rule, are so remote from their visual proto- 
types that we shall term them concepts rather than images. 

Such a situation arises in the design of self-improving systems 
for the recognition of the meaning of phrases (see Glushkov, Trish- 
chenko and Stogniy [30]). In the simplest case, when we consider 
phrases consisting of only a subject and predicate, the concepts being 
recognized may be considered to be the nouns selected as the subjects, 
and their properties may be the possibility or impossibility of their 
meaningful combination with the various verbs appearing as predicates. 

If the verb list is fixed and includes n different verbs, then 
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each concept (noun) can be characterized by a line of compatability 
with these verbs, having the length n. At the ith location on this line 
there will be one or zero in accordance with whether the combination of 
the considered noun with the ith verb of the given list (1 =1, 2. ..., 
-e-, n) is meaningful or not. 

We shall call these lines the verb lines. The problem of the self- 
improving system in this case will include the prediction of the 
largest possible number of properties of the considered concepts on 
the basis of a limited experiment. If the experience (training history) 
consists in communicating to the mentioned system, one after another, 
meaningful combinations of randomly selected nouns and verbs from the 
given lists, then in the: course of the arrival of such information cer- 
tain places of the verb lines of theconcepts (nouns) which we have se- 
lected will gradually be filled with ones. In this case the training 
problem can be treated as a problem of the reconstruction of the struc- 
ture of the unitary machine for the classification of the properties 
of the selected concepts. For the solution of this problem it is nat- 
ural to organize the process of the combination of concepts which are 
compatible with the same verbs into classes which correspond to new 
generalized concepts which may not be present in the original list. For 
example, as the result of combining several concepts (say, "father," 
"son," "student," and "professor" and so on) with respect to the cri- 
teria of compatability with the verbs "live" and "think" there arises 
the concept of "human." If after the formation of a particular class 
4t is seen that certain of its representatives have some new elementary 
property (for example, the possibility of combining with the verb "go") 
then this property can be extended to the entire class (i.e., to all 
the concepts occurring in this class). Of course, errors may occur in 
this extension of the properties. To eliminate the resulting errors 
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we introduce the process of independent composition by the machine of 
new phrases as the result of combination of other (randomly selected) 
concepts from the considered class with the verb whose compatibility 
was extended over the entire class. The sense (or nonsense) of these 
combinations must be communicated to the machine by the human teacher. 

Thanks to the existence in language of connections similar to the 
connection which is described by the statement of the type "almost all 
those who think can also speak," the described process with j  .iciously 
chosen methods of formation of the classes, extrapolation of the prop- 
erties, and composition of new phrases makes it possible for the ma- 
chine to perform the correct separation of phrases into meaningful and 
meaningless with high probability. Here is is of essence that this sep- 
aration be performed for all phrases which can be constructed from the 
given sets of nouns and verbs. In particular, they may include phrases 
which have not been constructed by the machine as questions for the 
teacher and those which were not communicated by the teacher to the ma- 
chine in the training process. 

Experiments on the training of a machine to recognize the meaning 
of phrases, not only of the simplest construction just described, but 
also phrases having a more complex structure, have been conducted suc- 
cessfully inthe Institute of Cybernetics of the Academy of Sciences of 
the Ukrainian SSR [30]. With various assumptions relative to the struc- 
ture of the language (inthe present case — the sets of verb lines) we 
can make estimates for the learning rate in the realized algorithms 
similarly to the way this was done for the a=-perceptrons in the pre- 
ceding section. However, the corresponding arguments are quite cumber- 


some and considerably less graphic than in the case of the perceptrons. 
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§9. SELF-ORGANIZATION AND SELF-ADAPTATION. METHODS OF SOLUTION OF COM- 
PLEX VARIATIONAL PROBLEMS 


In §1 of the present chapter we introduced the concept of the 
algorithm system as the natural form in which the properties of self- 
organization and self-improvement are combined. Let us consider ~ 
concept in more detail. For most of the problems encountered in prac= 
tice it is advisable to differentiate self-organization itself and the 
so-called self-adaptation, which is the simplest case of self-improve- 
ment. More precisely, we shall differentiate the simplest type of self- 
improvement on the basis of self-adaptation and the higher type of 
self-improvement on the basis of self-organization. 

The difference which is involved here is that self-improvement 
on the: basis of self-adaptation assumes the variation of only certain 
numerical parameters in the operational algorithm, while self-organi- 
zation is associated with the variation of the structure of the algo- 
rithm itself. Of course, this difference is to a considerable degree 
artificial since with suitable writing of the algorithms the varia- 
tions in the algorithm structure can be reduced to variations of the 
numerical parameters. If, for example, a numeration of all the algo- 
rithms of the considered class is accomplished, then any alteration 
of the algorithm reduces to a change cf the corresponding number, which 
can be considered as a numerical parameter. However, in spite of the 
relativity of the difference between the use of some fixed form of 
writing of the algorithms (algorithmic language) this difference can 
be drawn quite sharply. 

We shall present some examples of self-adaptation and self-im- 
provement of the structure of algorithm schemes. As the first example 
let us consider the case frequently encountered in practice of self- 


adaptation which carries the special name of extremal regulation. The 
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essence of the problem of extremal regulation consists in the delivery 
to the regulation system of those values of certain parameters xy: 

Xos +++, X, Such that the specified function f = f(x, , Xone eves x) 

of these parameters takes on an extremal (maximal or minimal) value. 
Here the function f(x,, Xoo sees x) also depends on certain other 
parameters Vy» Yoo sees von which vary regardless of our desires and 
over which direct control is not possible. As the result of their 
change, the values of the parameters X19 Xov sees Xp which provide 
the desired extremum of the function f cannot be selected once and for 
all — they must be altered along with the change of the parameters Vy: 
Yoo «++ Vy 

In practice we most frequently encounter the case when finding 
the extrema by the conventional methods (with the aid of equating the 
partial derivatives of the function f to zero and the solution of the 
resulting system of equations) is impossible or inexpedient. The rea- 
son for this may be, for example, the absence of an analytic expres- 
sion for the function f. The question arises of what methods can be 
used for the solution of the problem of self-adaptation in this sit- 
uation. One of the universal methods for the solution of this problem 
in this case is the so-called method of steepest descent (or steepest 
ascent). 

The method of steepest descent (ascent) serves for finding the 
minimum (maximum) points of a function of many variables with the aid 
of the development of a special process for sequential approxima’ lon 
to these points. Let £(x,; Xone sees x) be any differential function 
of n variables. In the space of these variables we select the arbi- 
trary point M(a,, Ans sees a) and find the approximate values of the 


partial derivatives f, = f' (a> Ans sees a) at the point M from the 


equations 
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giving all the variables in turn the same increment A. Let us find out 
what increments must be given simultaneously to all the arguments itn 
order to approach the extremum point to the maximal possible degree 
using only the values of the function f and its derivatives at the 
point M. 

This latter condition is quite essential, since without it the 
question would be solved trivially; the increments of the variables 
could be such that they would lead us directly to the extremum point. 
However, we do not know the extremum point and we are required to re- 
sSuive the question on the approach to it on the basis of the informa- 
tion and the behavior of the considered function in the neighborhood 
of the selected arbitrary point M(a,; Bos sees a.) Denoting the de- 
sired increments of the variables X12 Xos eeey X, at the poing M by 
A,» Ao» ee an respectively and using the equation for the total dif- 
ferential, we obtain for the increment Af of the function f at the 
point M the approximate equation 

Af = fA. + fads t+... + Frode 

If we agree to take a step in the direction of the extremum point 
(in the Xy9 Xos oes Xp, variable space) only of same one constant 
length r, then still another equation is addéd to the equation for the 


equation for the for the increment of the function f 


A2+AR+...+ AP =e. (111) 
We must choose the quantities Ay» As» tay hn so that with sat- 
isfaction of equation (111) the function Af will reach a maximal (with 
account for the sign) value. Using the method of undetermined Lagrange 


multipliers, the question reduces to finding the extremum of the function 
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F = S(f,A, —Ad3) +48 
of the variables A,» Ay» wetes A. Differentiating with respect to 4, 
and equating the resulting partial derivatives to zero, we obtain the 
system of equations 
f,—-2kA,=0 = (i = 1,2... 0m), (112) 
which must be supplemented, of course, by equation (111). From equation 
(112) we find that 
A, = kf» (113) 

where k is the coefficient of proportionality, common for all i = 1, 
25 ose. ote 

Depending on the choice of the sign and the coefficient of pro- 
portionality k, equations (113) determine two opposite directions 
along which movement from the point M will lead to the (in the vicin- 
ity of point M) most rapid increase (with k < 0) of the function f. 
These directions are termed respectively the directions of steepest 
ascent and steepest descent at the considered point M. The magnitude 
of the advance r in either of the indicated directions is termed re- 
spectively the ascent or descent gradient step at the point M. 

Depending on whether we are required to find the maximum or mini- 
mum point of the considered function f, we select one of these direc- 
tions (sign of the parameter k in equations (113)) and perform the 
movement in the selected direction (determined by the choice of the 
magnitude of the parameter k) until the function f changes the nature 
of its growth in this direction, i.e., switches from increase to de—- 
crease, or, vice versa, from decrease to increase. In other words, the 
maximal advance is made in the selected direction which provides for 
variation of the runction f in the desired direction (in the direction 
of decrease with search for the minimum point of the function f and in 
the increasing direction with search for the point of its maximum). 

Denoting by the letter N the point obtained as the result of this 
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movement, we perform the same operations with it that were described 
for the point M. As a result we obtain the new point P, and so on. If 
the function f is sufficiently smooth, then as the result of the pro- 
cess described, continuing it sufficiently long, we arrive at an arbi- 
trarily small neighborhocd of some stationary point of the given func- 
tion, i.e., that point at which all the partial derivatives of the 
function are equal to zero (of course, this will be true only in the 
case when the given function has stationary points) or at the neigh- 
borhood of a point of the boundary of the domain of definition of the 
function f corresponding to some local extremal (in the given domain) 
value of the function f. 

The desired point of the (absolute) extremum of the function f 
with the assumptions made is among the indicated points. However, there 
is no guarantee that the application of the methodology described above 
will lead the first time to the point of absolute extremum of the given 
function. Therefore, we must by varying the inicial point M find (by 
the method described above) new stationary points of the function, so 
that as a result of subsequent comparison of the values of the func- 
tion.at these points we can select from among them the desired extre- 
mum point. 

In practice a different route is generally preferred: we select 
a random series of points My; Mos Pee er Mi. in the domain of definition 
of the function f, and from them we vary that one at which the func- 
tion has the maximal (in the case of the problem of finding maximum 
points) or minimal (in the case of finding minimum points) value. 
Starting from the point thus chosen, we perform the steepest descent 
or steepest ascent using the methodology described above. With suffi- 
ciently large k (depending on the selection of the function f) with a 
probability arbitrarily close to unity there can thus be found the 

= 326 — 








point of absolute (and not some local) extremun. 

One of the possible variants of the search for the absolute max-= 
imum of the function of two variables is shown in Fig. 15. In this 
figure the function is specified by its contour lines (lines of equal 
level), M,> Mo» M3 denote the randomly selected initial points (the 
point My is the highest of them), and N, P, Q denote the sequential 
series of points obtained from the point M, using the steepest ascent 
method. In this example, after only three steps of the steépest ascent 
we arrive at the point of absolute maximum of the considered function. 

The algorithm system which resolves the problem of self-—adapta- 
tion consists of the operatioral algorithm A which, receiving the in- 
put word (value of the function f(x,, Xone sees x,)) defining the cri- 
teria of the regulation quality and, possibly, the values of some other 
quantities, delivers an output word consisting of the coordinates of 
some point M(a,, Ans sees an) which coincides in the case of the sta- 
tionary regime (invariance of the function £) with the point of abso- 
lute extremum of this function. In the case of the nonstationary re- 
gime (variation of the function =) there comes into play the algorithm 
B which performs the search for the point of the absolute extremum of 
the altered function f by some method (the method of steepest descent 
or ascent, for example) and replaces by the coordinates of this point 
the parameters Az2 Ans +++, Ans put out by the cperational algorithm 
A. 

The described system, consisting of the algorithms A and B, can 
be considered as a self-adaptive system of algorithms. 

Let us now consider an example of self-improvement with variation 
of the structure of the operational algorithm. Let us assume that the 
operational algorithm must, from the various numerical values of the 
coefficients p and g of the reduced quadratic equation x° +px+q=90 
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deliver the roots of this equation, al- 
though in the beginning of the operation 
this algorithm is not yet known. Since 





the reduced quadratic equation is re- 


Fig. 15 


solved using the known equation X12 = 
= —Fx)/ Bg» the desired algorithm can be sought in the class of 
equations constructed with the aid of the operations of addition, sub- 
traction, multiplication, division and extraction of the square root, 
using the letters p and gq and whole numbers. All such equations can be 
numbered, after first arranging them in order of increasing complexity: 
with increase of the number of the equation there is an increase, gen- 
erally speaking, of the number of operations used in the equation and 
an increase of the maximal magnitude of the integral parameters con- 
tained in it. 

Initially, one of the simplest equations is selected as the ope- 
rational algorithm, say the equation p + q. Taking successive values 
of the coefficients p and g (p = 3, q = 2, for example) the operational 
algorithm delivers the corresponding values of the root or roots (in 
the present case x = p + q = 5). The training algorithm makes a veri- 
fication of thesolution obtained by substituting the value of the root 
found in theoriginal equation. If these values of the roots satisfy 
the equation, then the operational algorithm is retained unchanged. If 
however, the solution is found to be incorrect (as in the case just 
considered) then thenext equation in order is selected as the opera- 
tional algorithm. 

It is easy to see that with the described organization of the ope- 
rational and training algorithms, after a finite number of unsuccess-— 
ful attempts there will be established the correct equation for the 
solution of the quadratic equation. Since in the process of the search 
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there is a change, generally speaking, of the structure of the algo- 

5a form of the equation) and not simply of thenumerical parameters, 
then according to the terminology which we have adopted the constructed 
algorithm system is self-improving on tne basis of self-organization, 
ite it is a higher type of self-improvement in comparison with self- 
adaptation. 

In the example considered the search for the required working al- 
gorithm is performed by simple sorting of all the algorithms of the a 
priori given class. We can, of course, not make use of preliminary de- 
termination of some special class of algorithms, but rather perform 
the sequential sorting in the class of all algorithms written in a par- 
ticular fixed algorithmic language (for example, in the language of 
the normal algorithms), however in this case the search time as a rule, 
is considerably longer. 

Normally the systems for such a search are realized on high-speed 
electronic computers which perform several thousand operations per 
second. With this speed the solution of the problem described (find- 
ing the equation for the solution of the reduced quadratic equations) 
takes little time. However, with further complication of the sought 
operational algorithms the search time, based on the sorting of all 
the variants, increases so rapidly that the realization of such a 
search in a reasonable time, even using the high-speed computers, be- 
comes impossible. 

In this case we resort to multistage search: we first look for 
some sufficiently simple component parts (blocks) of the desired ope- 
rational algorithm, and then use various combinations of the con- 
structed blocks. The blocks themselves can be built up from still 
smaller blocks, so that the process of further division of the search 
into individual steps can be continued even further. The multistep 
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search systems permit the construction of very complex self-organiz— 
ing systems which are analogous to the creative search systems used by 
man, 

Without going into further detail concerning self-improvement on 
the basis of self-organization, we shall concentrate our attention on 
the problems associated with self-adaptation. 

The method of steepest descent (or ascent) described above is al- 
so a certain sort of search, In this search we use a definite strategy 
(or tactic) for the reduction of the sorting of the various variants 
leading to the problem solution. In the case considered the search 
strategy reduced to the use of the information on the local properties 
of thecorresponding function f, which we term the estimating Sunction 
(the values of this function can be considered as an estimate of the 
quality of theapproximation to the required solution which is found 
at a particular step of the search). 

If the number of parameters (arguments of the estimating func- 
tion) is very large, various difficulties arise in the use of the 
method of steepest descent (ascent) in the simplest form described 
above (cycling on secondary minima or maxima, excessively slow rate of 
advance toward the absolute extremum, etc.). In order to eliminate 
these difficulties we introduce various alterations and improvements 
in the local search strategy described above. 

The simplest changes are associated with the selection of the par- 
ticular descent (ascent) gradient step. In particular, it is desirable 
to perform the advance in a given direction, not until the increment 
Af of the estimating function changes sign, but only until the rel- 
ative magnitude of this increment Af/f is less (in modulus) than some 
(in modulus) then some a prior fixed quantity termed the gradient test. 

In many cases we can divide the extimating function f into two 
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classes, so that variations of the variables of the first class lead 
to relatively large variation of the value of the function f, while 
the variations of the variables in the second class alter this value 
to a considerably lesser degree. The methods described above provide 
too low a rate of advance toward the extremum in the directions corre- 
sponding to the variations of the variables of the second class. Fir- 
uratively speaking, we can perform a rapid descent (in the direction 
corresponding to the variations of the variables of the first class) 
to the bottom of some "gully" and then wander more or less randomly 
about its bottom without getting any closer in practice to the extre- 
mum point. 

To eliminate this deficiency Gel'fand and Tsetlin [18] proposed 
a special method which they termed the gully method. The essence of 
the method is that on the "slopes of the gully" there are selected two 
points (X_ and X,) rather than one. From these points there is .er- 
formed a steepest descent to the "bottom of the gully" as a result of 
which there arise two new points Ay and A,. Connecting the points Ay 


and A, with a straignt line, they perform in the direction of thid 


1 
line (in the direction of reduction of reduction of the estimating 


function) the so-called gully step whose magnitude is usually consid- 
erably larger than the magnitude of the descent gradient step. This 
step leads to a new point Xo» from which we again perform a steepest 
descent to the point A,, located on the "bottom" of the gully. In the 
direction defined by the points A, and Ag we again make a gully step 
leading to the new point X,. From it we again perform a steepest de- 
scent to the "bottom of the gully" and so on. 

We have described the method for the descent (i.e., for finding 
the minimum of the estimating function £).. Of course the correspond— 
ing constructions are applicable to the ascent (finding the maximum of 
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the function f). In this case in place of the descent into the "gully" 
we perform an ascent to the "ridge" and the further advance, not along 
the "bottom of the gully" but along the "crest of the ridge." 

All the described descent (ascent) methods relate to the claes of 
methods for finding extrema of functions or functionals which we com- 
bine under the name of variational methods. In the majority of the 
cases of interest for cybernetics, particular limitations of the pos- 
sible values of the arguments of the estimating function .’ take on 
considerable importance. In this case the sought extrema may be reached 
on the boundaries of the domain of definition of the function f rather 
than within the domain. The descent (ascent) methods considered above 
are in principle also suitable for finding such "boundary" extrema. 
However, in several particular cases certain special variational meth- 
ods are far more effective. 

Among this sort of methods are the so-called linear programming 
(or linear planning) methods. These methods are used in the case when 
the estimating function f is a linear function (polynomial of first 
degree): f = Q4X1 + 85X5 + .e. + Ax, + ao and the boundaries of the 
domain of definition of the variables are composed of hyperplanes, 
i.e., surfaces specified by equations of the first degree. In this case 
the domain of definition is a multidimensional polyhedron (not neces- 
sarily finite), all points of which polyhedron satisfy the system of 
lenear inequalities of the form Pisa + i Ets ban oh °4 nh + "152 
>0O (4 = 1, 2, ..., m). The signs of the inequalities can, of course, 
reverse with a change of the signs of the coefficients Day 

It is not difficult to see that the estimating function f takes 
extremal value in one or several vertices the extremal value is taken 
to be the function f on the face (generally speaking, multidimensional) 
passing through all these vertices. Therefore we can find the desired 
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extremal value by sorting one=by-one all the vertices of the polyhe= 
dron, however with a large number of variables this method is extreme- 
ly cumbersome and not suitable in practice. Far more effective meth— 
ods for the solution of linear programming problem have already been 
developed. These methods provide for the use not of the linear in- 
equalities, but rather the linear equations to which any inequalities 
can be reduced by the introduction of new unknowns. With this intro- 
duction the inequalities B bi x1 + b, > 0 are replaced by the equation 

z b, x; + b= HW where y, are the new unknowns which can take only the 
nonnegative values (1 = 1, 2, ..., m). 

In practice we most frequently encounter the following statement 
of the linear programming problem, which we shall term the canonical 
form. 

Given the system of m linear algebraic equations with the un- 
knowns = $qyx, = b, (f =1,2,...,m)+ Required to find that nonegative (all 
X 4 > 0) solution of this system for which some fixed linear form (es- 
timating function) f= Z cats takes the smallest possible value. 

We shall describe one of thepossible effective methods for the 
solution of this problem which is usually termed the simplex method. 
In the use of the simplex methods we perform sequential transforma- 
tions of the given system of equations S$ a,x, = 6,(i =1,2,.., m) until it 
is reduced to some special form. The Note is first transformed all 
the free terms by are made nonnegative ( if by < 0, it is sufficient 
to change the signs of both sides of the ith equation); then the equa- 
tions are rewritten in the form O=b,— Bay; (i =1,2,..,m) - If in the re- 
sulting system there is the variable Xp appearing only in one equa- 
tion, say the pth, and having the positive coefficient Bsn then that 
variable takes the name basic, and the corresponding equation is 
solved relative to this variable. Identifying all the basic variables, 
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designated by the letters Xq9 Xoo oes xX, » we reduce our system to 
0 
the form 
ss a 
x,=6b,- a, %, (e == 1,2,..., Re); 
rom (114) 


O=b—S at, Gm betl, kot 2....m). 
johyti 


The equations of the second group (not solved relative to Xx) are 
termed the O-equations (it is not impossible that all the equations 
of the systems will be in this group). The purpose of the further 
transformations consists in finding some nonegative solution of the 
system (114). These transformations reduce to the sequential (multi- 
ple, generally speaking) repetition of the cycle consisting of the 
following steps (see [6]): 

1. We find the O-equation for which the free term is strictly 
greater than zero (if there is no such O-equation, then the problem is 
solved, since, setting x, = b, (Kamel, 2, .005 Ko) and x, = 0 (1 = 
= Ko + 1, Ky + 2, ..., n) we obviously find the nonnegative solution 
of the system (5)). Let this be the ith equation. 

2. In the found (1th) equation we identify some positive coeffi- 
cient “53, ( if all the coefficients aay in the ith equation are non- 
positive, then the system (114) obviously cannot have positive solu- 
tions and, consequently, the posed linear programming problem is un- 
solvable). 

3. In the same column with the identified coefficient “14, (1.e., 
in the j,th column) we find the so-called resolving coefficient a4 


which is characterized by the fact that of all the relations by/a, 
1 


with positive a (4 = 1, 2, ..., n) the ratio b, /a, j. has the min- 
LJ) if itl 
imal possible value. 
4, The equation in which the resolving coefficient appears (1.c., 
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this is related to the class of basic variables, and the found expres- 


th equation) is solved relative to the variable X 4 » which after 
1 


sion for “en is substituted into all the remaining equations (if the 
ijth equation does not belong to the number of O-equations, the vari- 
able Xs standing in its left side is excepted from the number of 
basic variables). 

5. After the solution of the i,th equation (relative to %i) we 
again find the O-equation with a positive free term and the entire 
operation described above is performed with it. 

The described process of sequential solution is continued until 
all the O-equations disappear. The desired nonnegative solution is ob- 
tained as the result of equating all the basic variables X4 to the 
corresponding free terms b, (1 = 1, 2, ..., ") and all the nonbasic 
varlables to zero. There are cases when the described process cycles 
and continues infinitely long without leading to any solution. In 
these cases we resort to variation of the selection of the O-equation 
and the resolving coefficient in the end and 3rd steps of the sequen- 
tial solution process, which usually helps prevent cycling. 

After termination of the sequential solution process, the found 
expressions for the basic variables are substituted into the estimat- 
ing function f= Bev » aS the result of which it takes the form 

f=d—Ede' - In the latter expression the summation extends only to 
the nonbasic variables, which (after corresponding numeration) are as- 
Signed numbers from r+ 1 ton. 

Then the solution process described above is applied to the sys- 

tem of equations for the basic variables obtained as the result of the 


first application of this process, which is written as 


a 
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and to the new O-equation 0 “ae 2 aX; - As the resolving coefficients 
we select only the coefficients a, 4+ The process terminates after all 


the coefficients d, in the O-equation become negative. Setting after 


this all the nee variables equal to zero, and all the basic vari-~ 
ables equal to the corresponding free terms, we obtain the required 
solution of the original linear programming problem. We note that in 
both the first and second applications of the sequential solutuion pro- 
cess all the free terms (with the exception of theterm d) remain non- 
negative all the time. 

Linear programming is widely used in problems for the optimal 
planning of transport shipments. Such applications of linear program- 
ming were first developed by Kantorovich [38]. Detailed substantia- 
tions of thesimplex method which we have described can be found in spe- 
cial nomographs devoted to linear programming. 


We shall describe still another general scheme of the variational 


. problems to which many problems of so-called dynamic programming (or 


dynamic planning) are reduced [7]. The essence of this scheme reduces 
to the following: in some (generally speaking, multidimensional) 
Euclidean space with the aid of certain limitations we identify a cer- 
tain class of curves which we shall term trajectories. On the set of 
all possible trafectories there is given some estimating function (or, 
as we usually say, functional). The problem consists in finding the 
trajectory on which the value of this functional is greatest or least. 
Let us consider one of the quite general numerical (approximate) 
methods of solution of the indicated problem developed by Mikhalevich 
and Shor the method of sequential analysis of variants. We shall dis- 
cuss this method with application to one of the simplest cases when 
the basic space is two-dimensional, the class K of trajectories con- 
sists of all the piecewise-smooth curves connecting the two fixed 
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points A and B of the space and contained wholly in some fixed finite 
region Q, and the estimating functional F has the property of additiv- 
ity. The additivity property consists in the functional F being consid- 
ered defined not only on the integral trajectories but also on any of 
their pieces (open subsets and their closures) and with combination of 
two disjoint pieces into one, the corresponding values of the estimate 
ing functional are added 
F (L, U Ly) = F (L,) + F (L)). 

The formulated conditions correspond to the dynamic programming 
problem in the Bellman formulation. 

The first step in the method of sequential analysis of variants 
is the limitation of the class K: of all the trajectories connecting 
the points A and B we identify only certain broken lines. This is done 
by means of passing several sections (straight lines in the present 
case) perpendicular to the segment AB and intersecting it. The ver= 
tices of the broken lines considered can be located only on the se- 
lected sections. Further, each section (in the limits of the region Q) 
is approximated by a finite set of points (possible vertices of the 
broken lines). The density of the positioning of the points of the ap- 
proximating set, and the density of the sections, is defined on the 
basis of the required accuracy of the problem solution. A graphic in- 
pression of therequired constructions is given by Fig. 16. 

In Fig. 16 the boundary of the region Q is shown by the dashed 
curve, the sections are denoted by the Roman numerals, and the points 
of the sets which approximate the sections are denoted by Arabic nu- 
merals. If the number of points in the ith section is denoted by n, 
and the total number of sections by m, then the total number of broken 
lines (N) corresponding to the postulated conditions will be deter-— 
mined, as is easily seen, by the product of the numbers n”,:N =n,n,... Mn. 
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This quantity increases very rapidly with increase of the number of 
points and sections, the result being that the sorting of all the var-— 
iants is practically impossible. 


However, we can abbreviate the sort 


+ ' ae a4 Vv 
‘t Pe a Bye ing by using the following technique. Let 
a ry 3 ®g 
~ 3 3 3 * Pid us connect point A by segments of straight 
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Fig. 16 lines with all the points of the first 
section and calculate the value of the 
estimating functional F for all these segments. To each point j of the 


first section we assign the value Fi, 


of the functional F on the seg- 
ment connecting this point with the point A. For each point k of the 
second section we find the point Jy of the first section such that the 


value of theestimating functionai F on the broken line A, k will be 
k 


smallest in comparison with its values on all 


PY ort! the other permissible broken lines connecting 

f : a the point A with the point k. Since the func- 

F 2h tional F is additive, the question reduces to 

Fel fs the minimization of the sum F, + F(J,» Kk), 
Fig. 17 Kk 


where F( 3p: k) 1s the value of the functional 

F on thesegment CI k]. The corresponding (min- 
imal among all possible values) value of the functional F on the 

k is denoted by Fo, To find it we make use of the al- 


r at the points of the preced- 


broken line spit 
ready found values of the functional F 
ing (first) section, which of necessity will be minimal here, and the 
sorting is limited to only all possible segments connecting the points 
of the first and second sections. 

Let us assume that for all the points p of the ith section there 


have already been found the minimal values Fe of the functional F on 
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all the permissible broken lines connecting these points with the 
point A. Let us consider the portion between the ith and the (i+ 
+ 1)th sections (Fig. 17). For each point gq of the (1 + 1)th section 
we find the point p 1 4 F(p., 
q Pq q 
q) is minimal. It is evident that the found minimal value of the in- 


of the ith section such that the sum F 


dicated sum will be the minimal possible value of the estimating func~ 


tional F on all the permissible broken lines connecting the point A 


i+1 
q 


run through all the points of the (1 + 1)th section, we find it pos-~ 


with the point g. Recording this value F and forcing thepoint q to 
sible to come to the consideration (on thebasis of completely analo- 

gous constructions) of the portion between the (1+ 1)th and the 

(i + 2)th sections. However, for the consideration of the portion be- 
tween the ith and the (1 + 1)th sections we need to remember only the 
function o*(a); assigning to each point g of the (1 + 1)th section 


the point p. = o*(q) of the ith section with which it connects most 


favorably. - the case shown in Fig. 17, 
q (1) 2; pf(2)=3; gf (34 g'(4)=5; G! (5) = 3. 

As as result of repetition of the indicated process we come, fi- 
nally, to the consideration of the portion between the last section 
(mth) and the final point B. Finding for the point B the point r = 
= ™(B) of the mth section with which it connects most favorably, we 
can from the functions @* (x) (i = 1, 2, ..., m) which we recorded find 
the best trajectory (admissible broken line) Akjk,... ky Ky,4--- kB 
connecting points A and B: the points of this broken line are deter- 
mined sequentially (from right to left) with the aid of the relations 
k, = @* (Ky 43) (2 =m, m-—1, ..., 1) where as the point k_,, we select 
the point B. It is easy to see that the found broken line actually 
minimizes the estimating functional F. Here the sorting used in find- 
ing the broken line is obviously limited to only ayn, + nny +... + Ami tim +m 
22559 = 





variants in place of NyNy «- Ny variants with complete sorting (n, is 
the number of points in the ith section). 

The described method is generalized directly to the case of multi- 
dimensional spaces. The requirement for the additivity of the estimat- 
ing functional F is also not strictly necessary. It is obviously suf-= 
ficient to assume that for any initial piece APQ of the trajectory the 
value F(APQ) of thefunctional can be represented in the form F(APQ) = 
= f(F(A, P), F(P, Q)), where f(x, y) is a real function which does not 
decrease with respect to x for any value of y. 

We note further that in the majority of the cases the sorting of 
the different variants of connection of the points of two neighboring 
sections can be considerably reduced as the result of various sorts of 
limitations (for example, the limitations on the maxima] slope of the 
segments of the broken line with relation to the x axis in the two-—di- 
mensional case). In several cases the sorting can be reduced by the 
use of the property of the continuity of the estimating functional. 
More complex constructions in the method of sequential analysis of var— 
tants are associated with special systematization of the limitations 
and functionals which permits construction of algorithms for the 
search for the optimal trajectory by sequential detection and discard- 


ing of "unpromising" segments of the trajectories. 
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Chapter 5 
ELECTRONIC DIGITAL MACHINES AND PROGRAMMING 

§1. THE UNIVERSAL PROGRAM AUTOMATON 

One of the most significant technical achievements of our time 
has been the creation of the universal program automata, 1.e., the au- 
tomatic information processors which make it possible to realize any 
algorithms. The modern universal electronic (digital) computers are 
such automata. It is interesting to note, as indicated by the name it-— 
self, that these machines were created for the purpose of automating 
computations, more exactely — for the automating of the performance of 
any computational algorithms. The term "universal" with application to 
these machine was understood by the creators of the first universal 
computers (and is still understood by many today) ijn the sense of uni- 
versality with relation specifically to the computational algorithms. 

However, since any algorithm can be reduced, as we noted in Chap- 
ter 1, to thecalculation of some partially recursive (arithmetic) 
function, the universality with respect to the computational algorithnus 
turns out to be universality in general. This circumstance is of great 
practical and theoretical importance, since actually the basis of any 
field of human activity is the processing of information in accordance 
with particular, frequently very complex sets of algorithms. 

The availability to us of the universal automatic information pro- 
ecessors such as the modern universal electronic computers makes it pos- 


sible, at least in principle, to automate any field of human activity 


which is based on the processing of information. This ‘nay be the solu- 
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tion of complex problems of a design nature, planning, production con- 
trol, translation from one language to another, composition of music, 
playing chess, many others. It is curious that the tremendous possi- 
bilities inherent in the universal electronic machines not only were 
not recognized by their first designers, but were even disputed by 
some of them. 

In this connection we must note still another error which is com- 
mon among individuals who are not familiar with the theory of algo- 
rithms. The idea is prevalent that the amazing properties of the mod- 
ern electronic digital machines are based on some specific character- 
istics of the elements used in these machines — the electron tubes, 
transistors, etc. In actuality, electronics in itself has no relation 
with their theoretical (qualitative) capabilities. 

These essence of the matter lies in the specific control prin- 
ciple and in the set of operations which these machines can perform, 
while the elements from which they are constructed can be of quite 
varied physical nature and can, in particular, be purely mechanical. 
The electronic elements are used for the purpose of significantly in- 
creasing the operating speed of the computers, and also to improve 
their reliability (on the basis of some fixed number of operations 
performed by the machine). We note that the first universal digital 
computer (Mark-1) was built using electromechanical elements (electro- 
magnetic relays) rather than electronic elements. 

The control principle, which provides for algorithmic universal 
ity (capability of realizing any algorithm) of the modern universal 
digital machines is a development and generalization of the principle 
which is the basis of the algorithmic scheme of Post described in 
Chapter 1. Just as in the Post scheme, the information in the univer- 
sal digital machine is stored in a memory which is divided into indi- 
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vidual cells (memory cells). However, in contrast with the Post scheme, 
in each cell there may be stored not a single binary digit (O or 1), 
but an entire word, composed of a considerable (usually 30-40) number 
of binary digits. We can, if convenient, consider these words as 
letters in some finite alphabet, however this alphabet will contain, 

as a rule, a very large number of letters 230 — 20, 

Therefore we usually prefer to consider the contents of each mem= 
ory cell not as an individual letter, but as a word in a binary alpha- 
bet. The binary digits (O and 1) composing this word are usually 
termed (binary) places, and the word itself is termed a binary code, 
sometimes simply a (binary ) number. We can, of course, consider the 
letters to be not the contents of the individual binary places, but 
some combination of thes places. For example, any binary code of 
length equal to three can be considered as a number in the octal nota- 
tion system, designating by traids of binary digits the octal digits: 
000-0, OO1l-1, 010-2, 011-3, 100-4, 101-5, 110-6, 111-7. Using not all, 
but only some four of the binary digits for the designation of the de- 
cimal digits, we can represent the binary codes consiting of such tet~ 
rads by numbers in the decimal notation system. 

In addition to be replacement of the binary digits by the multi- 
place binary codes, there is a second essential difference in the or- 
ganization of the memory (or the storage device) of the universal dig- 
ital machine and the memory for the algorithms in the Post scheme. Be- 
ing an abstract algorithmic scheme, the Post scheme assumed the exise- 
tence of an infinite number of cells, or the existence of a memory of 
unlimited volume. At the same tiine, in the real technical devices, 
which the universal digital machines are the size of the memory is of 
necessity limited. 

In the modern large universal digital machines the size of the 
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high-speed memory does not exceed 100,000 cells (usually no more than 
4-8000). This situation must be kept inmind, since it is closely re- 
lated with the concept of the machine universality. Strictly speaking, 
for the possibility of the realization of any algorithm the universal 
digital machine must accomodate the writing (representation) of this 
algorithm in its memory. Since the representation of the algorithms 
can be arbitrarily long, for the actual capability of realization of 
any algorithms the machine memory must be infinite. 

Keeping in mind, however, that an infinite memory cannot be real- 
ized in any technical device, it is customary to term a machine uni- 
versal if the organization of its control and the set of operations 
are such that they would provide the possibility of the realization of 
any algorithm with the condition of unlimited size of the memory. 

In practice the universality of the modern machines is provided 
by the fact that in addition to the high-speed (the so-called opera 
tional) memory device, it is also equipped with a relatively slow (the 
so-called external) memory devices which are capable of exchanging in- 
formation with the operational memory device. The capacity of the ex- 
ternal memory (usually composed of magnetic tapes) can be considered 
practically unlimited, which then determines (with the possibility of 
exchanging codes with the operational memory) the practical possibil- 
ith of the performance of any algorithm on the machine. 

The sequence of operations performed by the universal digital ma- 
chine is determined by the program established in its memory, which is 
an ordered finite set of instructions which can be considered as a 
natural generalization of the orders used in the construction of the 
Post algorithmic scheme. In contrast with the Post scheme in which the 
active cell is displaced with the performance of each succeeding or- 


der by no more than one step to the right or left, in the universal 
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digital machine provision is made for the possibility of arbitrary 
variations of the position of the active cell from order to order. To 
do this, in each order there is introduced the number of one or sev- 
eral memory cells which are active with the performance of the given 
order. 

The number of the memory cells in the universal digital machines 
are customarily termed addresses. The number of addresses in the or- 
ders of the modern universal digital machines (the number of memory 
cells which are active in the performance of any these orders) usually 
varies in the range from 1 to 4. Corresponding to this we differenti- 
ate single-address, dual-address, triple-address and quadruple-address 
orders. 

Let us first consider machines with a quadruple-address systen of 
orders, i.e. , those machines in which the maximal address level of the 
orders equals four. Different types of orders correspond to different 
operations which can be performed by the machine. The orders are usu- 
ally recorded in the machine in the form of binary codes which can be 
stored in the machine memory (both operational and external). 

We will assume that in each memory cell there can be contained 
either one order, also termed command or command word, or one informa- 
tion word. Just as was done above in the case of the information words, 
each command word (order code) can if desired be considered as a word 
in any finite (not necessarily binary) alphabet. 

Any command word is divided into operational and address parts. 
In the operational part there is the code of the operation which is 
performed during the time of action of the order which is represented 
by this command word. All the orders of the same type have identical 
operational parts. In the address part of the order there are located 
the addresses of the cells which are active at the time of action of 
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the order. 

The operations performed by the universal digital machines are 
usually divided into several classes. The first class includes the 
arithmetic operations — addition, subtraction, multiplication and di- 
vision. The four~address orders for the performance of the arithmetic 
operations usually have the following structure: in the operational part 
of the order there stands a code number designating the particular ope- 
ration (for example, one — addition, two — subtraction, three — multi- 
plication, four — division). The first two addresses in the order are 
used to indicate the addresses of the memory cells which store the nun- 
bers with which the operation is to be performed, i.e., the addresses 
of the. addends in the case of addition, the addresses of the minuend and 
subtrahend in the case of subtraction, etc. The third address of the or- 
der shows the transfer of the result of the performance of the opera- 
tion (sum, difference, product or dividend). Finally, the fourth ad- 
dress of the order is used to indicate the memory cell which stores 
the order to be performed following the given order. 

The orders for the performance of the logical operations are con- 
structed just as in the case of the arithmetic operations. The logical 
operations are as a rule two-place operations, performed place-—by- 
Place, i.e., separately for each pair of corresponding binary places 
which participate in the code operation. These include, for example, 

lacewise conjuction (logical multiplication) and placewise dis junc- 
tion (logical addition). There are also single-place logical opera- 
tions on the codes. Such operations incluse right and left (logical) 
shifts which transform the binary code Xy9 Xoo +00 KX, into the codes 


OX4X5 ws and XoX3 as x0 respectively. 


n=] 
A special role is played by the so-called control transfer opera- 


tions, which serve for the variation of the order of performance of 
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the program orders as a functions of the results obtained in the real- 
ization of the program. A typical control transfer operation (also 
termed the conditional transfer operation) is the so-called operation 
of conditional transfer on exact coincidence of words. The first two 
addresses. of the order which realizes this operation are used for the 
indication of the memory cells from which the two words being compared 
with civ another are taken. In the case of culucidence (quality) uf 
these words the next order is taken from the memory cell indicated by 
the third address, and in the case of noncoincidence it is taken from 
the fourth conditional transfer order address. Conditional transfers 
of other forms are also possible, for example, conditional transfer on 
the basis of the sign of the difference of two words or on the basis 
of the sign of some one word (in the latter case, of course, it is suf- 
ficient to have three rather than four addresses in the conditional 
transfer order). | 

The memory of the universal digital machines is usually arranged 
to that with the selection (reading) of a word from any cell for the 
performance of a particular operation there takes place a sort of bi- 
furcation of this word. One of its exemplars goes to the correspond- 
ing device for the performance of the operation, while the other re- 
mains in the cell from which the selection was made. With the writing 
of a new word into a particular memory cell the information previously 
contained in this cell is automatically destroyed (erased). 

Taking account of the indicated properties of the memory of the 
universal digital machines, it is not difficult to note that for the 
performance of any Post algorithm it is sufficient to make use of only 
the operation of (algebraic) addition, one of the conditionald trans- 
fer operations, for example the operation of conditional transfer on 
exact coincidence of words, and the operation of machine stop. 
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Actually, we shall agree to operate with only some two informa- 
tion words Po and Py the first being identified with zero and the sec- 
ond with unity of thebinary alphabet of any given Post algorithm A. 
Let us divide the memory of the considered universal digital machine 
into three parts. The first part consists in all of five cells: ap 
Ap» Ay, Bo» by in which there are placed the words — 1, 0, l, Po» P33 
in the cells of the second part there will be placed the program which 
simulates the program (scheme) of thealgorithm A; finally, the third 
portion of the machine memory M simulates the information tape of the 
algorithm A. 

With operation of the algorithm A on a specific input word p on 
which this algorithm is defined, the original, intermediate and final 
information occupies only some limited (finite) part of the informa- 
tion tape, since the algorithm operates only a finite number of steps 
and at each step writes information in no more than one new cell. 
Therefore, if the machine memory M is sufficiently large we can place 
in its third part identified above the required portion of the infor- 
mation tape. The difficulties with the possible insufficiency of the 
memory size, in view of the assumption made above, should not be taken 
into account in the resolution of the question on the theoretical re- 
presentability on the machine of particular algorithms. 

Let us turn to the direct simulation of the orders of the Post 
algorithm A by the orders of the machine M, 

As noted in §4 of Chapter 1, in the Post algorithms six differ- 
eat types of orders may be encountered. The order of the sixth type 
(stop) is simulated directly by the corresponding order of the machine 
M. The orders of the first two types (writing zero and unity in the 
cell being considered) are simulated by the orders of the machine M 
which accomplish the transfer of information from the cells bo or by 
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into the active cell indicated, for example, by the third address of 
the machine order. It is clear that this transfer can be accomplished 
by an addition order, in the first two addresses of which there is the 
pair of addresses of the cells Ao and bo or of the cells ao and bj; 
while in the third address theree is the address of the active cell 
(the fourth address, just as in the Post algorithm, is used for the 
indication of the address of the order which must be performed follow- 
ing the present order). 

The Post order of. fifth type is simulated, as it is not diffi- 
cult to see, by the machine order for conditional transfer on the 
basis of exact word coincidence. It is sufficient to compare the word 
in the cell by with the word in the active cell and transfer to one of 
the two orders on the basis of the results of this comparison. 

Finally, each Post order q' of third or fourth type is simulated 
with the aid of a group of machine orders whose number is equal to the 
number of orders of first, second and fifth types in the considered 
Post algorithm A. Actually, let r' be any order of first, second or 
fifth type of the algorithm A. In view of what was said above, it is 
simulated by a single machine order, which we denote by r. Let the Post 
order q' displace the active cell one unit to the right. In the case 
of the machine program this displacement can be accomplished only by 
means of variation by plus 1 of the addresses of all the active cells. 

There are such cells, however, only in the machine orders which 
simulate the Post orders of first, second and fifth types. As a result 
of suitable numeration of the addresses we can, without losing gener- 
ality, assume that the addresses of the active Post cells are written 
in some definite, let us say the last, address of the order. Consider- 
ing the codes to be whole binary numbers, we come to the conclusion 
that for the alteration of the address of the active cell in the order 
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r by +1 it is sufficient to add the code of the command r with the 
code plus 1, located in the cell a: The shift of the active cell to 
the left is accomplished analogously. 

From what we have said it is clear that algorithmic universality 
of any program controlled digital automaton will be provided if with 
the aid of the operations performed by it we can accomplish the four 
operations ; 

1) the operation of the transfer of the contents of any memory 
cell to any other memory cell; 

2) the operation of the addition of the code of an order located 
in any memory cell with constants which alter the value of the given 
(first, second, third or fourth) address of the order by plus 1 or 
minus 1; 

3) the operation of conditional transfer on exact word coinci- 
dence; 

4) the operation of (unconditional) machine stop. 

In the case considered above, operations 1) and 2) are provided 
by the same machine operation — the operation of algebraic addition. 
Usually, however, in the universal digital machines these operations 
are separated, the second being termed the readdressing operation or 
command addition. 

Of course, in addition to the indicated operations, in the set of 
operations of every universal digital machine there must be the opera— 
tions of the entry and exit of the information from the machine, and 
aldo (in the case of the use of an external memory) the operations 
which provide for two-way exchange of information between the opera- 
tional and external memory devices. 

Let us consider the question on the ways of reducing the number 
of addresses of the orders. First of all it is not difficult to see 
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that the fourth address, used for the indication of the succeed’.ng or- 
der, can be eliminated by positioning the orders in the machine memory 
so that the address of the order to be performed following any given 
order p is always larger by unity than the address of the order p it- 
self. In other words, the use of the fourth address becomes unneces-=- 
sary under the condition that the order of arrangement of the instruc- 
tions in the memory cells corresponds to the order of their perform- 
ance by the machine. 

Violation of the usual (natural) order of succession of instruc- 
tions can occur only in the case when the instruction being performed 
is the conditional transfer command. As we noted above, in the four- 
adress system of instructions one address is used for the indication 
of the following instruction with nonsatisfaction of the condition on 
which the conditional transfer is based, and a second address is used 
for the indication of the following instruction with satisfaction or 
this condition. With replacement of the four-address system of instruc- 
tions by a three-address system it is usually assumed that in the first 
case (nonfulfillment of the condition) after the instruction for con- 
ditional transfer there is performed the instruction written in the 
next memory cell in order, and only in the second case with fulfill- 
ment of the condition is one of the addresses used for the indication 
of the address of the instruction which must be performed next. 

From this it follows that the fourth address can be made redun- 
dant not only in the ordinary instructions which do not alter the sub- 
sequent order of performance of the instructions, but also in the in- 
structions for the conditional transfer. The resulting three-address 
instruction system is usually termed a system with natural succession 
of instructions, in contrast with the previously described four-ad- 
dress system with forced s'ccession of instructions. The advantage of 
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the latter system lies in the greater freedom which it offers in the 
question of the arrangement of the sequence of commands in the memory 
device which are used to control the operation of the machine. The ad- 
vantage of the three-address system is the simplification of the 
structure of the instruction. 

Further reduction of the number of addresses in the instructions 
can be achieved as a result of fixing some supplementary memory cell, 
usually structurally separated from the other memory cells of the ma- 
chine. After the fixing of this cell a simple way is opened to the re- 
duction of the number of addresses in the instruction to the natural 
minimum, i.e., to a single address. Let us clarify this method using 
as an example the addition operation, which requires the use of three 
addresses: the address of the addend ay» the address of the augend ao 
and the address a3 to which the sum is to be sent. With the aid of the 
fixing of the supplementary cell bo this three-address operation can 
be performed by means of the sequential performance of three single- 
adress operations — the operation of the transfer of the number from 
the cell ay into the (fixed) cell Do» the operation of addition of the 


number contained in cell an with the number in cell b. with subsequent 


0 
writing of the result in cell bo and, finally, the operation of trans- 
fer of the number from cell Bo into cell az. Since in this case only 

the addresses ais os a3 can change while the address bo is fixed once 
and for all, all three indicated operations are actually realized with 


the aid of single-—address instructions. 


The described method leads to the single-address system of in- 


structions which is used in many of the modern universal electronic 


digital machines. Having the single~address system, it is not diffi- 
cult to construct ajso the two-address system of instructions. In this 


case the second address can be used either for the indication of the 
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address of the following instruction (two-eddress system with forced 
succession of instructions) or for the indication of tne addresses of 
the number codes (information words) with which the operations are per— 
formed (two-address system with natural succession of commands). 

Every device having a discrete memory which is divided into in- 
dividual cells and whose operation can be controlled with the aid of 
a sequence of command words — instructions — which are arranged in 
certain of these cells is termed a program automaton and the indi- 
cated sequence of instructions itself is termed the program for the 
automaton operation. 

If the set of operations (types of instructions) performed by the 
program automaton makes it possible to compose from them the opera- 
tion of the transfer of the information words from any memory cell in- 
to any other memory cell, the operation of readdressing (alterations 
of the addresses in the instructions) by +1, the operation of condi- 
tional transfer and machine stop, and if as the program of the autom- 
ation there can be specified any finite sequence of operations from 
this set, then such an automaton is termed a universal program autom- 
ation. 

With an accuracy to the limitations introduced by the fixed mem- 
ory size, the universal program automaton is capable of reproducing 
any algorithm with the condition of suitable coding of its input and 
output alphabets. This conclusion relates not only to the conventional 
algorithms, but also the random and self-altering algorithms (see §5 
of the present chapter). 

§2. STURCTURE OF THE MODERN UNIVERSAL PROGRAM AUTOMATA 

The modern universal program automata consist of five different 
basic devices — the memory unit (MU), the arithmetic unit (AU), the 
control unit (CU), the input unit and the output unit (output). As we 
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noted in the preceding section, the memory device (memory) serves for 
the memorizing and storing of the program for the automaton operation, 
and also of the initial, final and intermediate information. The input 
device serves for the input of the program and the initial information 
(conditions of the problem) into the automaton memory, the output de- 
vice serves for the output from the memory of the final information 
(response to the problem posed to the automaton). 

The arithmetic device, as its name indicates, serves for the per 
formance of arithmetic operations. However the arithmetic unit is usu- 
ally alse used for the performance of other operations, logical for 
example. In this connection 1t would be more accurate to term the 
arithmetic device an operational device. However we shall not deviate 
from established tradition in the terminology of the AU. 

Finally, the control unit combines and coordinates the operation 
of the all remaining devices of the universal program automatcn, ac- 
complishes the selection, decoding and organization of the instruc- 
tions composing the program. In the modern universal program automata 
the control unit is constructed on the cyclic principle. The essence 
of this principle is that the operation of the automaton in time is 
broken down into natural intervals, termed the operating cycles, in 
the course of which there is repeated approximately the same sequence 
of elementary operations. 

The determination of the beginning and end of the operating cy- 
cle is to a certain degree arbitrary, since their simultaneous shift 
in either direction is possible. We will assume that the operating cy- 
cle begins when in the control unit the command (instruction) subject 
to performance has already been transmitted. In the course of the cy- 
cle this command is performed: its operational part is used for the 


readying for performance of certain operations of both the control 
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unit itself and of the arithmetic unit. The adress portion of the com- 
mand is used for the excitation of the corresponding cells of the MU 
for the purpose of extracting from them or entering in them of certain 
information. In the multi-address systems the command for the decod~ 
ing of the various addresses is accomplished sequentially. The operat- 
ing cycle terminates with the extraction from the MU and the transmis- 
sion to the CU of the code of the following instruction subject to per- 
formance. 

We note that the CU not only transmits information to other units 
but also receives information trom them: the command code from the Mu, 
the result of the verification of the conditions defining the transfet 
to one or another of the two commands following after the conditional 
transfer command from the AU, and certain other signals from the AU. 
As we mentioned inthe preceding section, in addition to the opera-- 
tional memory unit (OMU) the modern universal program automata are al- 
so equipped with an external memory unit (EMU) which is slower acting 
than, but at the same time of larger capacity in comparisan with the 
OMU. The block diagram of a universal program automaton which defines 
the interaction (information exchange) between its basic units is 
shown in Fig. 18. 

For the detailing of the block diagram let us consider in more 
detail the structure of the individual units composing it, and pri- 
marily the structure of the OMU, AU and CU. Essential component parts 
of all three devices mentioned are the so-called registers. A register 
is a memory cell which is intended for the storage of one information 
or command word. However, in contrast with the usual memory cells to 
which access is possible only after the accomplishment of quite com- 
plex preliminary commutation (switching), the registers are particu- 
larly accessible memory cells whose inputs and outputs are direclty 
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connected to the circuits which transmit the information. 

Depending on the method of functioning of these circuits in uni-e 
versal program automata (universal digital machines) are divided into 
two major classes: the series and parallel machines. In the parallel 
machines (automata) with the transmission of the code from register to 
register all the digits of this code are transmitted simultaneously, 
while in the series machines they are transferred sequentially, one 
after the other. It is clear that the parallel machines, other condi- 
tions being the same, will te faster acting then the series machines, 
although they require a larger number (equal to the number of digits 
in the machine codes of the words) of parallel channels for the trans- 
mission of the information between the registers, while in the series 


machines we can limit ourselves to one such channel. 





6 
Fig. 18. 1) External memory unit; 
Ay ene memory unit; 
3) input; 4) output; 5) arithmetic 
unit; 6) control unit. 


The arithmetic unit of the modern universal electronic digital 
machines usually consists of three registers, one of which has the 
capability of summing the numerical codes transmitted to it and is 
therefore termed a summator. The numerical codes with which the arith- 
metic unit operates are numbers of differing signs, and the summation 
which we are discussing is understood as algebraic addition (with ac- 
count for the signs). In the parallel machines the summation operation 
is usually performed in two elementary cycles of the machine. Here by 
elementary cycle we mean the interval of time between two sequential 
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clock pulses applied in the AU. Most frequently the source of the 
clock pulses is the synchronizing generator which is common to the en- 
tire machine and is a part of its control unit. There are also other 
methods of organization of the elementary cycles which are used in the 
so-called asynchronous machines. These methods are described in detail 
in the handbooks on the electronic digital machines. 

The operations performed in the various portions of the universal 
program automaton in the course of a single elementary cycle are cus- 
tomarily termed microoperations. More complex operations which are 
performed over several elementary cycles are realized with the aid of 
a set of microoperations, termed the microprogram of the given opera- 
tion. The microprogram of the summaticn operation in the arithmetic 
units of the parallel machines usually consists of two microopera- 
tions: the microoperation of bit-organized addition and the micropro- 
gram with which there are realized the carries from some places to 
others which arise as a result of the place-by-place addition. In this 
case it is assumed that one of the addends was estabiished on the sum- 
mator ahead of time and the other on one of the AU registers. 

We can, moreover, also construct the AU so that after the setting 
of the addends on the register and the summator the addition will be 
accomplished as a result of only a single microoperation — the trans— 
fer of the augend from the register into the summator. In the future 
we shall consider that the AU which we are discussing is constructed 
in just this way. Summators of such AU are custimarily termed accumu- 
lators, since they have the capability of accumulating the sum of any 
number of terms as the result of the sequential transmission to the 
summator of all the terms one after another. 

Making use of the circumstance that the summator performs alge- 
braic addition (with account for the signs of the addends) it is not 
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difficult to organize on such a summator the operation of subtraction 
as well, by performing the transfer (from register to summator) or the 
code of the subtrahend as an ordinary addend but with the sign re- 
versed. In the set of microoperations of the universal digital ma- 
chines we therefore introduce the microoperations not only of conven- 
tional (direct) transfer of the number codes, but also the transfer of 
the code with its sign changed. We must also provide for the microope- 
rations of register clearing, as a result of which on the cleared reg- 
isters there must be established the number codes which are the re- 
presentation of the number 0, 

For the performance of the operations of multiplication and di- 
vision with the natural method of coding the numbers, the described 
microoperations are not sufficient. Therefore, along with the micro- 
operations already described in the set of the microoperations of the 
universal digital machines we also introduce the microoperations of 
left and right shift on the registers. With performance of the micro- 
operation of left shift the number code Ky Xp eee X, set in the reg-=- 
ister is replaced by the code. X, X3 “ee x05 and with performance of 
the right shift microoperation — by the code O XyX%q +++ X_y- The code 
sign (not specially designated here) retains its value. Here the code 
digits on the right usually represent the lower digits of the number 
and the digits on the left represent its higher digits. Therefore we 
also say the with a right shift the number code is shifted in the di- 
rection of the lower digits, and with a left shift — in the direction 
or the higher digits. 

As the feedback signals transmitted from the AU to the CU we usu- 
ally select the signals on the sign of the number code which is in 
the summator and on the digit of the lowest place of the number code 
which is in one of the AU registers, which we denote by the letter Py. 
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This register 1s also termed the multiplier register. The second reg- 

ister of the AU does not have feedback connection with the CU. It is 

usually termed the multiplicand register and is denoted by the letter 

Pi 
As a rule, in the MU there are only two registers, one of which F 

is termed the number register, and the other the address register. On 

the address register there is stored the address of the cell of the MU 

with which operation is to be performed (writing or reading of the 

code) and on the number register there is the number code being se- 

lected from the MU or being sent there for stroage. In addition, there 

are usually two other channels for the receipt of signals from the CU 

on the nature of the coming operation (writing or reading). The trans- 

mission of a pulse along one of these channels (write channel) leads 

to the code set in the address register being memorized (written) in 

the MU cell whose address coincides with the code set in the address 

register. The transmission of a pulse along the other channel (read 

channel) leads to the code from the MU cell whose address is set in 

the address register being transferred to the (previously cleared) 

number register. 

The two described MU microoperations are termed respectively the 
microoperations of MU read-in and read-out. In addition to these mi- 
crooperations, in the MU provision is made for the MU register clear 
microoperation and also the microoperaticn of the transfer of codes 
from the AU and MU registers into the number and address registers and 
the reverse transfers from the MU into the AU and CU. Speaking of the 
transfer of a code from register to register, we must always under- 
stand, if not stipulated otherwise, ordinary transfer, i.e., transfer 
of the code without change of its sign. 

Let us analyze the structure of the CU, following the scheme of 
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microprogramming control, first described by Wilkes and Stringer [75]. 
The microprogram control unit has in its composition two registers, 
termed respectively the command register and the microoperation reg- 
Aster. The command register serves for the storage of the command (in- 
struction) currently being performed. In accordance with the accepted 
structure of the instructions, the command register is subdivided into 
several registers — the operation register, the first address register 
the second address register, etc. In the description of the micropro- 
grams it is sometimes convenient to work with the command register as 
a whole, and sometimes to break it up into the component parts. 

The microoperation register, sometimes also termed the microcom- 
mand register, serves for the storage of the code of the microprogram 
(microcommand) instruction which is being performed at the given mo» 
ment, 1.e., the code denoting the ensemble of the microoperations be- 
ing performed at a given moment. 

In addition to the command and microoperation registers, in the 
universal program automata with the natural order of succession of in- 
structions (see §1 of the present chapter) there is still another reg- 
ister, termed the command counter. With the application of a pulse to 
a special input of this register, there takes place an increase by 
unity of the number code set in it prior to this time. If the command 
counter were cleared ahead of time, then it will obviously perform a 
count of the pulses arriving at its input. This is where the register 
derives its name. The command counter serves for the storage of the 
command addresses. In the process of the performance of an instruc- 
tion, not an instruction for conditional transfer, there is an ine 
crease of the command counter contents by one and the selection of a 
new instruction from the MU in accordance with the address thus ob- 


tained. 
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The increase of the contents of the command counter by unity is 
one of the microoperations of the CU. Among the other microoperations 
provided fcr in the CU we note the clearing of the registers, the 
transfer of codes from the MU number register into the command reg- 
ister, the transfer of codes from the CU address registers (first, 
second, etc.) into the MU address register, the transfer of a code 
from the command register into the summator (for accomplishment of the 
readdressing operation) and the transfer of the code from one of the 
CU address registers (usually from the third) into the command counter 
(with performance of the conditional transfer operation). For the per- 
formance of the logical operations which were mentioned in the preced- 
ing section, several new microoperations are added to the number of 
microoperations of the arithmetic unit. 

As for the microprogram control unit itself, in the Wilkes and 
stringer scheme it includes, in addition to the microoperation regis- 
ter mentioned above, the so-called microoperation decoder and two 
diode matrices,* termed the A-matrix and the Bematrix. A simplified 
symbolic circuit of the microprogram control unit is shown in Fig. 19. 
On this figure the letters POn denote the operation register (a com- 
ponent part of the CU command register) and PMO denotes the microope- 
ration register. The dots designate the points of connection of the 
diodes which connect the horizontal buses of the matrix with the ver- 
tical buses. The purpose of the diodes is to pass the pulses in the 
forward direction (from the horizontal buses to the vertical) and to 
prevent their passage in the reverse direction. With this condition 
the pulse applied to any horizontal wus D goes to those and only those 
vertical buses with which the given bus D is connected by the diodes. 
If the connection of the buses is accomplished indirectly, false paths 
for the passage of the pulses are possible which were not intended by 
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the designer. For example, with 
pulse application to the second~ 
from-the-top horizontal bus of the 


A=-matrix the pulse would appear 





5 A-mampuya  8-mompuya 6 not only on the first left verti- 
Fig. 19. 1) Operation regis- cal bus but also on the third from 
ter; 2 Neel Genuras reg- 
ister; 3) synchronizing pulse; 
4) decoder; 5) A-matrix; 6) Be the left, passing to it through 
matrix. the second-from—the-botton hori-~ 


zontal bus. The use of the diodes 
eliminates the possibility of the formation of such false paths, since 
the reverse transfer of pulses from the vertical buses to the horizon- 
tal is not possible. 

The purpose of the decoder shown in Fig. 19 is the transmission 
of the successive pulse SI of the synchronizing generator applied to 
its input to precisely the one of the horizontal buses of the A-matrix 
which is uniquely determined by the codes set at the considered moment 
of time in the operation and microoperation registers. Some of the 
horizontal bus of the Bematrix and some go into two such buses. In the 
latter case the transfer of the pulse from the horizontal buses of the 
B-matrix is determined by the feedback signal (by the signal p org in 
Fig. 19) coming from the AU. 

Transferring to the vertical buses of the B-matrix (determined by 
the method of connection of the diodes) the pulses enter the micro- 
operation register, altering the code previously set there. Thexefore 
the following pusle of the SI, passing through the decoder, will go to 
a new horizontal bus. By connecting the vertical buses of the A-matrix 
to the corresponding units of the machine so that the transmission of 
a pulse along each of the vertical buses leads to the performance of 
precisely one microoperation, we obtain the possibility by using this 
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process of accomplishing any finite sequence of microoperations (mi- 
croprogram), in this case combining several of the microoperations in- 
to one microcommand. 

By writing the microprograms for the various operations which the 
machine must perform, we determine the method of connection of the 
diodes in the A=-matrix and in the B-matrix and consequently the struo- 
ture of the entire microprogram control unit. With the performance of 
the microprogram of any machine operation the contents of the opera- 
tion register remains unchanged, while the contents of the microope- 
ration register varies with every new elementary cycle. Only at the 
very end of the performance of the operation, after the selection of 
the new instruction from the memory, does the contents or the opera- 
tion register change, after which there begins the performance of the 
microprogram of the succeeding operational cycle of the machine. 

Since the structure of the control unit and even of the entire 
machine is to a considerable degree determined by the selection of the 
operations and the microprograms for them, let us consider concrete 
examples of microprograms for the most common machine operations. Here 
for definiteness we shall consider that the machine under discussion 
is a threeeaddress parallel universal digital machine with natural or- 
der of succession of the commands. We use the letter p to denote the 
number sign function (+1) of the number located in the summator, and 
gq to denote the value of the lowest place of the number in the multi- 
plier register Po (it is assumed that the machine operates with n- 
place proper fractions in the binary notation system). 

Addition Microprogram 
1. Clear AU and MU registers. 
2. Transfer of the code from the CU first address register into 


the MU address register. 
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3. MU readin. 
4. Transfer code from MU number register into AU summator. 
5. Clear MU registers. 
6. Transfer code from CU second address register into MU address 
register. 
7. MU read-in. 
8, Transfer of code from MU number register into AU summator (most 
frequently via the AU register P,). 
9. Clearing of MU registers. 
10. Transfer of code from AU summator into MU number register. 
ll. Transfer of code from CU third address register into MU ad- 
dress register. 
12. MU read-in. 
13. Clearing of MU registers. 
14. Increase of command counter content by one. 
15. Transfer of code from command counter into MU address register. 
16. MU read-out. 
17. Clearing of command register. 
18. Transfer code from MU number register into command register. 
Certain of the microoperations making up the described micropro- 
gram can be combined in time, as a result of which the time for the 
operating cycle can be reduced and the speed of the machine can be in- 
creased. Examples of such combined microoperations might be the micro- 
operations 16 and 17 or microoperations 10 and 1l. 
Microprogram for Multiplication by Positive Number 
1. Clear AU and MU registers. 
e. Transfer code from CU first address register into the MU ad- 
dress register. 


3. MU readin. 
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4, Transfer code from MU number register into AU register Py. 

5. Clear MU registers. 

6. Transfer code from CU second address register into MU address 
register. 

7. MU readin. 

8. Transfer code from MU number register into AU registerx Py. 

9. (1) Transfer code from P, register into summator if q = 1, and 


go to the following microoperation without number transfer if q = 0. 


10. 
il. 

9. 
10. 
ib. 


13. 


(1) Right shift on sunmator. 
(1) Right shift on P, register. 
(2) Same as 9(1). 

(2) Same as 10(1). 

(2) Same as 11(1). 


(n) Same as 9(1). 
(n) Same as 10(1). 
(n) Same as 11(1). 
Clear MU register. 
Transfer code from CU third address register into MU address 


register. 


14, 
i5- 
16. 
17. 
18. 
19. 
20. 


Transfer code from summator into MU number register. 

MU write-in. 

Clear MU registers. 

Add one of content of command counter. 

Transfer code from command counter into MU address register. 
MU read-in, clear ccmmand register. 


Transfer code from MU number register into command register. 


In the multiplication microprogram, in contrast with the addition 
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microprogram, considerable use is made of the possibility of branct.ing 
in the order of succession of the microoperations as a function of the 
feedback signal g coming from the AU. It is not difficult to see that 
the sequential performance of microoperations 9, 10, 11 is equivalent 
to the performance of the usual, well known multiplication algorithm 
with roundoff for the binary notation system. 

Actually, in the described technique the AU summator serves for 
the storage of the suns of the partial products of the multiplicand 
by the individual digits of the multiplier. This storage is accom- 
plished with an accuracy to the lower digits which are dropped in the 
right shift of the summator contents. Each time the multiplicand is 
added or not added to the partial sum stored in the summator, depend- 
ing on whether the lowest digit of the right-shifted multiplier is 
equal to one or zero. It is clear that this procedure together with 
the preliminary shifts on the summator and the Po register denotes 
the addition of the product of the multiplicand by the next (right) 
digit of the multiplier in the previous sum of the analogous (partial) 
products, as is required in the conventional multiplication algorithm. 
Roundoff to the number of significant digits contained in the cofac- 
tors is accomplished as a result of the shifts on the summator which 
lead to the elimination of the lowest digits of the product. In the 
case of multiplication not only by positive, but also by negative nun- 
bers an additional microoperation for the formation of the sign of the 
product is introduced into the microprogram. 

Maicroprogru for Conditional Transfer on Inequality 

1. Clear AU and MU argisters. 

2. Transfer code from CU first address register into MU address 
register. 
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4, Transfer code from MU number register into AU summator. 

5. Clear MU registers. 

6. Transfer code from CU second address register into MU address 
register. 

7. MU read-in. 

8. Transfer code from MU number register into AU register. 

9. Transfer code with sign change from register into summator. 

10. If the summator content s > 0, then add one to content of com- 
mand counter, if s ¢ 0 then transfer code from CU third address reg- 
ister into command counter. 

ll. Clear MU registers. 

12. Transfer code from command counter into MU address register. 
13. MU read-in. 

14. Clear command register. 

15. Transfer code .«om MU number register into command register. 

This microprogram accomplishes the comparison of the number Ay 
written from the first address of the command with the number A, writ- 
ten from its second address. If it is found that Ay > A, control is 
transferred to the next command in order. If, however, AL < Ay the 
following command is taken from the third address of the current com- 
mand. It is not difficult to see that from the two operations of con- 
ditional transfer with resprct to inequality, by switching the places 
of the numbers Ay and A, we can form the operation of conditional 
transfer with respect to the exact coincidence of the words described 
in the preceding section. 

The described basic principles of the organization of the algor- 
ithmic process in the universal electronic digital machines gives a 
general idea of the so-called block structure of such macfrines. In the 
real design of the electric programming automata the stage of the 
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block synthesis is only the starting point for the development of 
particular circuit solutions. The selection of these solutions is made 
on the basis of the theory of automata and the theory of combination 
circuits presented in Chapters 1 and 3. 

§3. THE CONCEPT OF PROGRAMMING 

Programming is the writing of a particular algorithm in the form 
of a finite sequence of instructions (commands) for the universal pro- 
gram automaton. Such a sequence is termed the operating program of the 
given automaton. Entered into the automaton together with the initial 
data (input word of the algorithm), it forces the automaton to perform 
the operation of the algorithm in question, i.e. to transform the 
given input word (input information) into the corresponding output 
word (output information). The universality of the set of operations 
of the modern program automata provides the possibility of program- 
ming any algorithm with the condition that we neglect the limitations 
imposed by the finiteness of the volume of the automaton memory. 

In order to understand the essence of the problems posed in pro- 
gramming, let us consider first some simple particular example. Let us 
assune that we are required to calculate a sum of the form Sa, 
where m is some fixed natural number. We shall compile the Sheen for 
the solution of this problem for 4 three-address universal digital ma- 
chine with natural order of succession of commands, whose set of ope- 
rations includes all four arithmetic operations (addition, subtraction 
multiplication and division), the operation of conditional transfer 
with respect to exact word coincidence and the operation of machine 
stop. 

Let us assume for simplicity that there is no need for input and 
output of information in the machine MU. In other words, the initial 


information and the program which will be compiled are assumed to 
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have been entered in the proper memory cells and the problem solution 
is obtained in the MU cells (in a single cell in the considered case) 
assigned for this purpose after stopping the machine. 

For the solution of the posed problem we introduce the following 


notations: 


x, = k; 
] 


i eel 
Spt zat... + 2p = Sei + Zp 

We shall assume that for the storage of the quantities Xps Vys 2, and 

Si there are assigned some four memory cells termed the working cells. 

It is natural to break the process of the computation of the desired 

sum 8S) down into m completely identical steps, in each of which we 

compute the corresponding value of the partial sum Si Qa an hee 22s erate 


., m). The computations are initiated with the value k = 1 and S(, 


and can be written in the form of the following scheme: 


1) yy = xh 
2) 2,= Wa 


3) Sp = Se—i + 2y3 
4) xin = kat if 


By if Xp4] = m+ 1 then go to the following instruction; if 
Xray ~#m +1 then return to instruction 1; 

6) stop. 

This scheme is actually aiready the desired program, in whose in- 
structions the actual addresses of the working cells are replaced by 
the symbolic designations X92 Yur 2,9 8, termed symbolic addresses. By 
fixing the actual addresses of these cells, and also of the cells con- 
taining the constants 1 and m + 1 which figure in the program in ques- 


tion, it is not difficult to go from the program with symbolic ad- 


dresses to the actual program in the three-address instructions. In 
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this case we shall assume that the conditional transfer instruction 
provides for natural succession of instructions with coincidence of 
the compared words (located in the first and second addresses of the 
conditional transfer instruction) and transfers control to the third 
address with noncoincidence of the compared words. 

Let the addresses of the working cells Xo Vys 2, and Si be re- 
spectively 1, 2, 3 and 4; let the addresses of the cells containing 
the constants 1 and m + 1 be 5 and 6; and let the address of the cell 
containing the first program instruction be 7. In this case the de- 
sired program (in actural addresses) is written in the form of the 
follwoing sequence of instructions: 

1) multiplication 1, 1, 2; 

2) division 5, 2, 3; 

3) addition 4, 3, 4; 

4) addition 1, 5, 1; 

5) conditional transfer 1, 6, 7; 

6) stop. 

In the cell with the address 1 there must initially be placed a 
number equal to one, and in the cell with the address 4 there must be 
a number equal to zero. The initial filling of the working cells with 
the addresses 2 and 3 is evidently unimportant, since the required 
filling of these cells is performed by the first and second instruc- 
tions of the program. We recall that the machine memory is presumed 
to be constructed so that with the writing or any word in any cell the 
previous content of this cell is erased. After stopping of the machine 
(performed by the sixth instruction) the sought value of the sum s, is 
obtained in the fourth memory cell. 

It is not difficult to see that in this program the cell with the 
address 2 can be used not only for storing the quantity Ys but also 
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for the storage of the quantity Zh. Actually, the quantity vi is used 
exclusively only for the calculation of the value of the quantity Zhe 
therefore after calculating Z) the value found can be sent to the cell 
where the quantity Y;, Was previously stored without danger of interfer- 
ing with the correctness of the following calculations. There exist 
general techniques which permit automating such a process of econo- 
mizing of the working cells. 

As the second example let us consider the programming of the pro- 
blem of the calculation of the scalar product of two n-dimensional 
vectors A = (a,, Bos sees an) and B = (b,, Dos sees Ds) i.e., the com-= 
putation of the sum s of the paired products of their corresponding 
components: s = a,b, + anb, Titeceanct ab: Let us assume that the com- 
ponents of the vector A are arranged in the cells with the addressec 
atl, at 2, ..., a2+n, and the components of the vector B are in 
the cells with the addresses b + 1, b + 2, ..., D+ mn, where a and b 
are certain fixed natural numbers, chosen so that the arrays of the 
cells assigned for the storage of the components of the vectors A and 
B are disjoint. | 

For the sake of variety let us compile the program for the cal- 
culation of the scalar product with application to a single-address 
machine with natural order of succession of commands. In this case the 
instructions which realize the arithmetic operations are understood so 
that the corresponding operation is performed with a pair of numbers, 
the first of which is in the AU summator, and the second is in the cell 
whose address is indicated in the instruction. The result of the ope- 
ration remains in the summator. For the performance of the arithmetic 
operations with such instructions it is also necessary to have instruc-— 
tions which accomplish the exchange of commands between the AU sum- 
mator and the MU cell whose address is indicated in the corresponding 
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instruction. 

The conditional transfer command can be performed in various ways. 
Let us assume that there takes place a conditional transfer for zero 
in the summator: if in the AU summator with the performance of a com- 
mand for conditional transfer there appears written the number O, then 
control is transferred to the next command in order, otherwise the se- 
lection of the following command is made from the address indicated in 
the conditional transfer command. It is assumed for simplicity that 
the ith command of the program will be stored in the cell with the ad- 
dress i (14 = 1, 2, ...). Assigning for the storage of the readdress 
constant (number equal to one) the cell with the address c, for the 
storage of the number n (dimension of the vectors A and B— the cell 
with the address a, and for the storage of the partial sum 8). = a,b, + 
+ aabs + ..e + ay b= the cell with the address s, we arrive at the 
following program: 

1) transfer to summator from MU, a + 1; 

2) multiply, b + 1; 

3) add, s; | 

4) transfer from summator to MU, s; 

5) transfer to summator, 1; 

6) add, c; 

7) transfer from summator to MU, 1; 

8) transfer to summator, 2; 

9) add, c; 

10) transfer from summator to MU, 2; 


11) transfer to summator, t; 


12) add, c; 
13) transfer from summator to MU, t; 
14) transfer to summator, a; 
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15) subtract, t; 

16) conditional zero transfer to summator, 1; 

17) stop. 

The cell with the address t which appears in the constructed pro- 
gram is the so-called program counter of the number of cycles. In- 
structions ll, 12 and 13 perform the increase of the content of tnis 
cell by one. If in the beginning of the computation there was a number 
equal to zero in the cell t, after the performance of n cycles (repe— 
titions of the group of instructions 1-13) there will be a number 
stored in this cell equal to n. Then as a result of the subtraction 
performed by instructions 14 and 15, zero will be obtained in the sun 
mator and the subsequent conditional transfer command will transfer 
control to command 17, which stops the machine. 

Up to this moment the conditional transfer command will transfer 
control to the first command, as a result of which the computation cy- 
cle will be repeated. However, this will not be a literal repetition, 
since with the aid of commands 5, 6, 7 and 8, 9, 10 there is accom- 
plished an increase of the address of the first and second commands of 
the program by unity. Therefore commands 1 and 2 will lead to the for- 
mation of the product of a new pair of components ab) of the vectors 
A and B, commands 3 and 4 will lead to the computation and the stor- 
age in the cell 3 of the new value of the partial sum s, = s,_, + aby. 

For a proper understanding of the described program it is neces- 
sary to note that the operation of transfer into the summator of any 
code assumes the preliminary clearing of the summator and after the 
transfer of the code from the summator to the MU the summator is also 
automatically cleared. In addition, it is necessary that in the begin- 
ning of the operation there be a number equal to zero written in the 
cell t. 
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The described method for performing the address substitution 
(change of the command addresses) is not convenient when the codes of 
the commands themselves are subject to change. First, it leads to ex- 
tension of the program (particularly noticeable in the case of the s 
single-address machines), and second, and this is most important, as 
a result of its use the initially given program is altered and is not 
suitable for further use without preliminary restoration of the orig- 
inal values of the address portions of the commands. This restoration 
introduces further complications in the program and requires additional 
MU cells for the storage of the original addresses. Therefore for the 
majority of the modern universal digital machines we prefer another 
method of readdressing, associated with the use of the so-called ad- 
dress modification registers, or index-registers. 

The index registers are a part of the control unit of the uni- 
versal program automaton and have the property that in the process of 
the performance of any command the contents of a particular one of 
them is automatically added to those command addresses which are 
equipped with a special label corresponding to this index register. 

With the use of a single index register I in the three—address 
command system, the program for the computation of the scalar product 
of the vectors can be written with only five commande in all (designa- 
tions of the cell addresses are the same as in the preceding program): 

1) add 1 to the content of the index register I; 

2) multiply, a(I), b(I), p; 

3) add, p 8 3; 

4) conditional transfer, I, a, 1; 

5) stop. 

In this program use is made of conditional transfer on exact co- 
incidence of codes in the index register I and in the cell a (where 
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the dimension n of the vectors A and B is written). In the case of co- 
incidence of these codes control is transferred to the sequentially 
following (fifth) command, and in the case of noncoincidence control 
is transferred to the first command of the program. 

In the course of the entire time of operation the program retains 
its initial form, only the content of the index rerister I changes. In 
case of necessity, there may be included in the program a special com- 
mand for the clearing of the index register, setting it to zero. 

In the programming of more complex algorithms, for example the 
algorithm for the multiplication of a vector by a matrix, the need 
arises to use several index registers for the storage of the readdress- 
ing constants which are changed by the various steps. Let us consider 
as an example the multiplication of the n-dimensional vector B = (b., 


b eee b,) by the matrix A of nth order with the elements as (1 < 


99 
<b ns lie <n); 

Let us assume that the sequential components of the vector B are 
located in the memory cells with the addresses b+ 1, b+ 2, ..., btn 
and the elements Asi of the matrix A are in the memory cells with the 
addresses a + (k-— 1) n+ i(i, k=1, 2, ..., n). The components of the 
vector C = BA are located in the cells with the addresses c + 1, c + 2, 

»», ¢ +n (there were initially numbers equal to zero in these cells). 
We use the cell with the address t as the working cell for storage of 
the intermediate results (its initial content is not important to us). 
Finally, the cells with the addresses 1, 2, ... are used for the stor- 
age of the sequential instructions which constitute the sought progran. 

Let us introduce the three index registers I,; In; I3; which must 
be cleared prior to initiation of operation, and let us place in the 
cell with the address d the number n, equal to the dimension of the 
vector B and the order of the matrix A. With the aid of the introduced 
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notations the desired program for the multiplication of the vector by 
the matrix is written in the following form: 

1) add 1 to the content of the index register 133 

2) add 1 to the content of the index register I, 3 

3) add 1 to the content of the index register I,; 

4) multiply, b(TI,), a(I,), t3 

5) add, c (I3), t, e(I,); 

6) conditional transfer, I,, 4; 2; 

7) clear index register, I,3 

8) conditional transfer, 13; ast 

9) stop. 

With further cemplication of the algorithms the difficulties of 
the programming increase more and more. In this connection there nat- 
urally arises the thought of looking for more economical methods of 
writing the information on the algorithm and the application of the 
most universal program automaton for the automatic translation of such 
forms into the actual operational programs. This idea constitutes the 
basis for automatic programming with the aid of the so-called univer- 
sal programming programs (translators). 

The universal programming program is an algorithm programmed for 
a particular universal digital machine for the translation of the 
statement of any algorithm in a particular formal algorithmic language 
into the instruction language of the given machine. As the formal ale 
gorithmic language in question here we can, of course, select any cf 
the languages described in Chapter 1, for example the language of the 
normal algorithm schemes. However such a choice would not facilitate, 
but rather would complicate the solution of the programming progran, 
since the statement of the algorithm in any of the abstract algorith- 
mic schemes of the first chapter is, as a rule, a considerably more 
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difficult problem than programming in the language of the instructions 
of the modern universal digital machines. To convince ourselves of 
this it is sufficient to consider the case of the addition operation, 
which is performed in the universal digital machines with the aid of 
one instruction while, for example, with the use of the normal algo- 
rithms it is realized with the aid of the quite complexly written 
scheme which contains many elementary substitutions. 

Therefore attempts have been made to develop those universal al- 
gorithmic languages which would retain the basic properties of the 
language of the modern universal digital machines but whlch would per- 
mit a simpler and more easily read statement of the algorithms en- 
countered in practice in comparison with the direct programming in 
the "machine" languages. Among the languages of this sort we note, for 
example, the Fortran language (U.S.), the Polish algorithmic languare 
SAKO, the address language (Kiev, USSR), and others. 

The creation of practical algorithmic languages is important not 
only because such languages facilitate programming, but also because a 
sufficiently well developed practical algorithmic language can become 
a generally accepted and generally understood language for the writ- 
ing of various algorithms. Thus, the ALGOL-60 language which was de- 
veloped by a group of European and American scientists has at the pre- 
sent received wide international acceptance. A detailed description of 
this language is given in the following chapter, here we shall consid- 
er certain techniques which facilitate direct programming in machine 
languages. 

The first technique, already considered above, is the use in the 
initial stage of the programming of symbolic addresses in place of the 
actual (numerical) addresses. The later assignment of the actual val- 
ues to the introduced symbolic addresses and the economy of the work- 
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ing cells are a purely technical operation and are easily subjected to 
automaton. In spite of its simplicity, the method of symbolic addresses 
permits significant simplification of the programming of complex pro- 
blems and, what is most important, considerably reduces the number of 
errors. 

A second method which can be used to significantly simplify di- 
rect programming is the inclusion of previously constructed simpler 
programs in the more complex programs. The programs specially adapted 
for inclusion in the more complex programs are usually termed subpro- 
grams. By accumulating a library of subprograms, the programmer can In 
many cases reduce the direct programming to a combination of a small 
number of avialable subprograms. To facilitate the combining of sev- 
eral subprograms into a single program, there have been worked out 
special techniques which make it possible to avoid introducing changes 
in the subprograms when including them in quite diverse programs. The 
difficulty lies in the fact that the last instruction of the subpro- 
gram must transfer control to some in3truction of the basic program, 
which changes from program to program and is not known to the complier 
of the subprogram. 

We can overcome this difficulty by sending at the moment of trans- 
fer to the subprogram the address of the instruction to which the ma- 
chine must go after completion of the subprogram into a special mem- 
ory cell which is termed the return register. In this case the subpro- 
gram must be terminated by a special instruction "transfer on the re- 


' which extracts the next command from the cell whose 


turn register,' 
address is stored in this register. We note that for the use of any 
given cell of the machine operational memory as the return register it 
is advisable to introduce referral to the memory using the so-called 
second rank address. With this referral the selection from the memory 
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or the writing into the memory are accomplished not by the addresses 
indicated in the command being executed, but by the addresses which 
are stored in the memory cells whose addresses are indicated in this 
command. With the use of subprograms which are included in other sub- 
programs (and not in the basic program) we must make use of several 
return registers or second rank address transfers. 

Such use of some subprograms within others creates the basis for 
the multistage organization of systems of standard programs. The ra- 
tionality of such organization is determined by the degree of economy 
of the arrangement of the library of standard programs in the partic- 
ular memory devices. This question becomes particularly important with 
tne scheme of realization of various sorts of standard subprograms 
which enrich the set of operations performed by the machine. In this 
case there is achieved a major economy of the work of the programmers, 
who find it possible to use a large subprograms, assigning to each of 
them only a single machine instruction. Such multistage organization 
of the control is realized in the "Promin'" computer of the Institute 
of Cybernetics of the Academy of Sciences of the Ukrainian SSR (Kiev). 

Moreover, even in the absence of the schematic realization a suf- 
ficitently extensive library of standard subprograms significantly fa- 
cilitates the programming, since a considerable portion of the new 
programs being compiled will, as a rule, be made up of previously pro- 
grammed standard portions available in the library. 

We note that in compiling a library of standard subprograms an 
attempt is made to provide a quite high degree of generality of the 
problems being solved. For example, the standard subprogram for the 
multiplication of matrices is compiled for the multiplication of ma- 
trices of any order rather thin for only some one order. Other subpro- 


grams are constructed similarly. 
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Also of significant assi:: ince in direct programming is the pre- 
liminary writing of the prograr .r simplified form, usually termed the 
program block diagram, with subse:*wnt programming of each individual 
block. A convenient method for writing program block diagrams is the 
operator method o: writing program diagrams proposed by Lyapunov. 

In the use of this method groups of program commands of a single 
type which follow one another (for example, commands which realize the 
arithmetic operations) are combined into the so-called operators. The 
most widely used are the arithmetic operators and the readdressing 
operators (which change the content of the index registers). We label 
the arithmetic operators with the letter A, the logical operators with 
the letter P, the readdressing operators with the letter I, and the 
stop operator with the letter F. In addition, the operators are nun- 
bered with the use of special indices in the order in which they occur 
in the program. 

Combining a group of arithmetic operations, every arithmetic ope- 
rator is a coded designation for the operation of computation using a 
particular, frequently quite complex, formula. The logical operator 
makes a verification of the logical conditions on the basis of which 
particular conditional transfers are performed (control transfers in 
the program which violate the natural order of succession of commands). 
A vertical bar is placed after the logical operator; above and below 
this bar there are indicated the numbers of the operators to which 
control is transferred in the case when the logical condition is sat- 
isfied, and correspondingly in the case when it is not satisfied. Ab- 
sence of a number below or above the bar indicates that in the corre- 
sponding case conteol is transferred to the operator standing directly 
to the right of the bar.. 


With the aid of the described symbolism the operator diagram of 
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the last of the programs which we considered can be written as 


IbsAsPo| IsPo| Fre 

Here the readdress operator I, corresponds to the first instruction of 
the program, and the operator I, corresponds to the two following in- 
structions. The arithmetic operator Aa combines the fourth and fifth 
instructions, and the remaining operators include one instruction each. 

To facilitate reading of the operator diagrams the bars which des- 
ignate the conditional transfers can be supplied with horizontal | 
lines above and below, directed to the left. In this case there is 
also placed ahead of the operator to which control is transferred a 
bar with a line directed to the right and labelled with the same num- 
ber as the corresponding bar of the conditional transfer operator. 
Then a group of operators which composes a cycle which is repeated 
several times as a result of the conditional transfers is framed on 
poth sides by sort of "brackets" which facilitate the search for such 
cycles. 

Using these notations, the operator dlagram described above can 


be rewritten 


th \ IAsPo IPo F;. 

By supplying the operator diagram of the program with the de- 
scription of each of the operators occurring in it (other than the 
stop operator) we can after the compilation of such a diagram turn to 
the individual, sequential programming of these operators with subse- 
quent combining of the individual pices of the program thus compiled 
into a single whole, These operations are to a considerable degree 
routine work and can be relatively easily automated with the aid of 
any universal program automaton. 

We note that for the description of the arithmetic and logical 
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operators we can make use of the conventional arithmetic or logical 
formulas. Having availatle the special programs for the automatic 
translation of such formulas into the machine instruction language and 
combining them with standard library subprograms, we find the possi- 
bility of presenting to the machine (universal program automaton) the 
task in the same from in which it is presented to the skilled human 
computer. This method actually combines the method of the standard sub- 
programs with the method using (to a certain extent) the universal pro- 
gramming programs. Therefore it is natural to term it the specialized 
programming program method or the programming program library method 
[24]. Here the specialization consists in the fact that the corre- 
sponding library is oriented to a certain class of typical problems, 
permitting actually the complete elimination of programming and limit- 
ing oneself to communicating to the machine only the conditions of the 
problem which must be solved. 
§4, THE UNIVERSAL ALGORITHMIC LANGUAGE ALGOL-60 

The international algorithmic language ALGOL-60, which for brevity 
we shall term simply ALGOL, is a means for the quite simple, precise 
and clear writing of computational algorithms. Being a universal algo- 
rithmic language, it is suitable, of course, for the writing of any 
(not necessarily computational) algorithms, however in the case of the 
processing of literal rather than numerical information the simplicity 
and the clarity of the corresponding "algol" writing is to a consider-— 
able degree lost. While the programming of the computational algorithms 
br ALGOL is a far simpler problem than the direct programming for the 
modern universal electronic digital machines, the programming of pro- 
blems on the processing of literal information by ALGOL is only slight- 
ly simpler than using the "machine" languages. 

The basic symbols used in the construction of the ALGOL language 
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are the Latin letters (26 capital and 26 lower case letters), the Ar- 
abic numerals (from zero to nine inclusive), the logical values "true" 
and "false," and also the operation symbols, separator symbols and 
brackets (the last three types of symbols are termed limiters). There 
are also a certain number of service words, for which words of the 
English language are usually used. It is customary to write these words 
in bold face type. 

For the notations of the numbers use is made of the decimal nota-— 
tion system, with the whole part being separated from the fractional 
part by a point (and not by a comma). The plus sign ahead of positive 
numbers and the zero symbol in the designation of the whole part of a 
proper fraction can be dropped. For the designation of a decimal ex- 
ponent (number of tens in an integral power) we make use of a special 
symbol — a ten dropped below the basic line (it is usually printed in 
bold face type). The numbers used in ALGOL are divided into two types: 
integer and real. The integer type includes only the whole numbers 
(with or without sign) which do not contain in their writing a symbol 
of a decimal exponent or a decimal point; all the remaining numbers be- 
long to the real type (here the number 3.0 is real but not an integer). 

Examples of the integer type numbers are: 0, + 275, — 0634, + 0, 

— 2. Examples of the real type numbers are: + 5.340, 8 (1.e., the 
number 5.34610°), — .063 (the number — 0.063), — 3719 — 32 (the num 
ber —0.37+1072°), + 409 (the number 10°), etc. 

The introduction of particular quantities in ALGOL (in writing of 
specific programs) is accompanied by their preliminary description. The 
subsequent concrete representations of these quantities must be inter- 
preted in accordance with the indicated descriptions. If, for example, 
some quantity x was described by the term integer, and then its value 


was introduced equal, say to 23.4, then this value must be mentally 
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rounded off to the nearest integer (23 in this case). The value of a 
quantity of the integer type, equal to 23.5, is rounded off to 24, and 
not to 23 (to the nearest larger integer).* We note also that if the 
quantity x described as real, nothing prevents it taking also integral 
values, however in subsequent operations with the quantity x we pro- 
ceed just as with any real quantity without performing rounding off 
to the nearest integer. 

For the designation of various kinds of quantities (constants and 
variables) in ALGOL use is made of the so-called identifiers. Any fi- 
nite sequence of letters (Latin) and decimal digits, of necessity be- 
ginning with a letter (and not with a digit), can serve as an identi- 
fier. Examples of identifiers might be a7LO, x, ga, aPg, TOWW etc. At 
the same time the expressions 7x, bab or ab + x cannot serve as iden- 
tifiers. The use for designation of quantities not only of the letters, 
but also of words, i.e., sequences of letters (possibly, meaningless) 
makes the supply of identifiers potentially unlimited, which is quite 
important from the point of view of the possibility of the representa- 
tion of any algorithms, no matter how complex. The possibility of the 
designation of a quantity by its natural name also presents obvious 
conveniences, for example: force, current, etc. At the same time there 
is one inconvenience with which we must contend in the future: in the 
construction of arithmetic expressions from the identifiers the multi- 
plication sign cannot be dropped (as is usually done in algebra), 
since the expression ab + xy will be understood in ALGOL as the sum of 
two quantities designated by ab and xy and not as the sum of the paired 
products of the quantities a, b and x, y. 

In addition to the quantities which take numerical values (of the 
integer and real type) in ALGOL use is made of Svolean quantities, 


taking only two values — "true" and "false." The Boolean quantities 
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are designated by identifiers in just the same way as the numerical 
quantities; in the descriptions they are assign u to the Boolean type. 

Similar quantities, for example the components of any vector or 
the elements of a particular matrix, are usually denoted by identi- 
fiers with one or several indices. The indices are written after the 
identifier and are enclosed in square brackets. Different indices are 
separated from one another by commas. Whole numbers (not just positive 
ones), variables and any arithmetic expressions which are always of 
the integer type, can be used as indices. 

Examples of writing of variables with indices are: A[l, — 2], ps 
[1] pbA8[i, J, 1]. 

Variables with indices which vary within certain limits consti- 
tute the so-called arrays. The array description in ALGOL is preceded 
by the English word array, before which there is placed the name of 
the type (integer, real, Boolean) of the variables composing the array 
(1f the name of the type of variables in the array is not indicated, 
it is considered that they are of the real type). In the description 
of the array, after the array identifier in index (square) brackets 
there is written the soecalled list of bound pairs. Each bound pair 
consists of two arithmetic expressions (or numbers) separated by a 
colon. The first of these expressions is the lower bound (smallest 
possible value) of the corresponding index, and the second is its up- 
per bound (highest possible value of the index). It is assumed that 
the index can run through all the integral values included between the 
lower and upper bounds, and in the case when the upper bound is less 
than the lower, the corresponding array is considered indeterminate. 

Examples of the description of arrays: real array x[1:n, O:m), Boolean 
array g4l0:5], integer array Nj — 7:1, é:j, 3:3). 

The number of different indices characterizing an array is termed 
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the dimension of this array. In the examples just presented the array 
X has a dimension of 2, the array gt has a dimension of 1. As for the 
array N, formally its dimension is equal to 3, however, since the last 
index can take only one single value (equal to 3) actually the value of 
the dimension of this array reduces to 2. 

We note that in the descriptions of the variables or arrays of 
the same type the name of the type can be written only once, and the 
corresponding indetifiers are separated from one another by commas. In 
the case of arrays with the same bounds the index brackets with the 
corresponding list of bound pairs can be written out only once — after 
the last identifier of an array with these bounds. For example, real 
a, bx7 or integer array A,B, Dof1:2, V:k}) +» The first description describes 
three variables which take real values, and the second describes three 
two-dimensional arrays with the same bounds which are composed of in- 
tegral quantities. 

For the separation of the described variables or arrays of dif- 
ferent types use is made of a semicolon. For example, real x,y; Boolean 

A,B,C; array px{1:2, é:k]; Integer array N{—1:0, 5:10), QI2:4) 

Arbitrary arithmetic expressions, which play a large role in the 
construction of the ALGOL language, can serve as the index bounds in 
the arrays. The arithmetic expressions are constmicted from numerals 
and variables with the aid of the six arithmetic operations — addition 
(denoted by the + sign), subtraction (denoted by the sign —), multi- 
plication (denoted by th: sign x), division (denoted by the slant line 
sybol /), integral division (denoted by the sign —) and raising to a 
power (denoted by the symbol f). 

The integral quotient a-b is the whole part (rounded off. in the 
direction of reducing the modulus to the nearest integer) of the con- 


ventional quotient a/b. This operation is applied only to quantities 
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of the integer type, so that the expression 30-50 is to be consid- 
ered indeterminate, (while the expression 3-5 is determinate and 
equal to zero). The operation of raising to a power af b (a to the 
power b) with positive a is determinate for all quantities b of the 
real and integer types, while for negative a it is defined only for 
the cases when the quantity b is of the integer type. 

Other conditions heing the same, in the arithmetic expression 
there must first be performed the operation of raising to a power, 
then the operations of multiplication and division (conventional and 
integral), then the operations of addition and subtraction. Like ope- 
rations (multiplication and division or addition and subtraction) are 
performed in the conventional order — from left to right. When it is 
necessary to perform operations in a different order use is made of 
round brackets. With raising to a power af b, the expressions a and b 
must as a rule be enclosed in brackets. Exceptions are permitted only 
in the case when the corresponding (not enclosed in brackets) quantity 
is an unsigned number, a variable (with or without indices), or a 
function (see below). 

Examples of the arithmetic expressions are the expressions x te 
(equal to x"), 3 tnt k (equal to (a°y");, abxAB+ Pr +(—g), (x7 + A9) t (—2) 
etc. 

Along with the variables represented by the usual identifiers or 
by array identifiers, in the construction of the arithmetic expres- 
sions use is also made of the so-called functions. Every function in 
ALGOL is designated by the assignment to the function of an identifier 
after which there is placed in round brackets the so-called list of 
actual parameters, i.e., in other words, the arguments of this func- 
tion. The actual parameters can be any expressions (arithmetic or Boo- 
lean) and also the array identifiers and certain other forms of iden- 
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tifiers which are defined below. The parameters are spearated from one 
another either by commas or by a line of the form:) any commentary: 

(. A commentary is the name given to a clarification of the meaning 

of the actual parameters, which we shall usually give in the Russian 
language. In the translation from the ALGOL larguage to the language 
of a particular computer this commentary is simply discarded. 

Examples of the functions might be f(x), Mx, y+a) AB(k+1.5) force: 
(p) acceleration: (a). The first of these functions 1s a single-place 
function (1.e., it depends on one actual parameter), the second func- 
tion is two-place, and the third is three-place (it depends on the 
actual parameters k f 1.5, p and a). 

In the descriptions the functions are usually termed procedures 
with an indication of the type of quantity it defines (integer, real 
or Boolean). For such functions as sine, logarithm and others, we re- 
tain the commonly accepted identifiers sin, ln, etc. We establish thie 
identifier sqrt for the designation of the square root, and the iden- 
tifier abs for the designation of the absolute magnitude. 

The descriptions of the functions include in themselves headings 
of the form real procedure sin (x), integer procedure abs(n); Boolean procedure A(a,d). In 
the descriptions use is made of the so-called formal parameters as 
the function arguments, i.e., certain identifiers which in the subse- 
quent use of the function can be replaced by any actual parameters, 
i.e., variables, expressions (arithmetic, Boolean, designational), 
identifiers of arrays of procedures or swithces, and also the so- 
called lines. 

The expressions constructed from numbers, variables of the real 
and integer (with or without indices) and functions (integral or real) 
with the aid of the arithmetic operations are termed simple arithmetic 


expressions. In order to construct more complex arithmetic expressions 
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it is necessary to become acquainted with the so-called Boolean ex- 
pressions. 

Boolean expressions are constructed from the logical values 
("true" and "false"), variables and functions (procedures) of the Boo- 
lean type and the so-called relations, which are two arithmetic ex- 
pressions A and B connected with one another by the equality or in- 
equality signs: A = B, A # B, A>B, ADB, A < B, A< B. As the ope- 
rations for the construction of the Boolean expressions use is made 
of the logical operations described in Chapter 2: equivalence (=), 
implication (>), disjunction (v), conjunction (A) and negation (] ). 
The proirity of the logical operations with respect to one another 
and the method of use of the brackets (only the round in the present 
case) are retained the same as in Chapter 2. We only need add that 
the arithmetic operations (expressions) are considered to take pre- 
cedence over all the relation operations, and the latter have preced- 
ence over all the logical operations, so that the expression a+b>cxX 
xdDxAyVvz must be understood as ((a +6) > (c x d)) D(xA yv2). - The quan- 
tities a, b, c, d in this expression are of the real or integer type, 
and the quantities x, y, z are Boolean. 

All the Boolean expressions defined so far are termed simple. 
From the Boolean expressions A, B, C of which the first expression A 
is simple, we can compose a more complex Boolean expression by use of 


the service words if, then, and else. The corresponding construction 


looks like: 
if € then A else B. 


It is assumed that the complex Boolean expression thus defined 
is A if the condition C 1s satisfied (1.e., if the Boolean expression 
C takes the value "true"), and is B otherwise. Since of the three ex- 
pressions A, B, C only the expression A must be simple, the expres- 
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sions B and C can in turn be camposed with the aid of some condition. 
Finally, in the construction of the simple Boolean expressions, along 
with the logical values, the variables, relations and functions it is 
permissible to use any Boolean expressions (both simple and complex ) 

which are enclosed in round brackets. 

Thus, recursive constructions of any depth are possible in the 
determination of the Boolean expressions. For example, the list of 
Boolean expressions might include the expression If a >b then (Ifa mb +c 
mo ae ee ee A G(K.E), where the-lower case letters denote vari- 
ables of real type, and the capital letters denote variables of the 
Boolean type, where G(K, L) is a function (Boolean procedure) of the 
actual parameters K and I. If all this expression is enclosed in 
round brackets it becomes a simple Boolean expression and as such can 
be. used in further constructions. 

The situation is completely analogous in the case of the arith- 
metic expressions: from the two arithmetic expressions A and B, of 
which the first is necessarily simple, and the Boolean expression C 
we can construct a complex arithmetic expression . 

nS then % etve B. 

The value of this expression 1s taken equal to A if condition C 
is satisfied, and equal to B otherwise. Just as in the case of the 
Boolean expressions, in the construction of the simple arithmetic ex- 
pressions 1t is permissible to use not only numbers, variables, and 
functions, but also any arithmetic expressions (simple or complex) 
which are enclosed in round brackets. So, for example, the expressions 
(if a> 5 then at2 elseat3) or (ita = then a—d elsea +5) +t (2 —6+2) must be con- 
sidered simple arithmetic expressions. 

All the descriptions presented so far are in essence auxiliary. 
The basic means for the construction of’ the algorithms in ALGOL are 
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the so-called operators. ALGOL-60 contains six different types of ope- 
rators: the assignment operator, the transfer operator, the empty 
operator, the cycle operator, the procedure operator and the condi- 
tional operator. The first five types of operators, in contrast with 
the last one, the conditional operator, are usually termed uncondi- 
tional operators. 

The assignment operator assigns to particular variables definite 
values specified by some arithmetic or Boolean expression A, The var- 
lables to which the value determined by the expression A us assigned 
are separated from one another and from this expression by a special 
separator: =(assignment symbol). All these variables constitute the 
left part and the expression A constitutes the right part of the as- 
signment operator. 

Examples of the assignment operators are: A: =&ll0]: =u: = n+ 
+1 +p; mi = m+l1;, B:=a>b; clj, 2zk):=5 —3 x vt 2. 

In the realization of the assignment operator there must be ob- 
served a strictly defined order of performance of the operations. 
First in order (from left to right) there are calculated the values 
of the indices (speicficed by the arithmetic expressions, which in 
this case are termed the subscript expressions) of all the variables 
of the left part. Then there is computed the value of the arithmetic 
expression in the right part and the value obtained is assigned to all 
the variables of the left part (with the already compute subscripts). 
Thus, for example, the operator: A: = B: = p + q must be performed 
as Z: = p +q; B: = z; A: = Zz, and not as B: = p+q; A: = p+q. The 
difference lies in the fact that the value of the arithmetic expres- 
sion p + q can change with each new calculation (for example, if it 
contains some function whose values are determined by a procedure 
which changes in the process of its performance). Therefore in the 
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performance of theassignment the value of the arithmetic expression 
must be compute only one time, and not with application to each in- 
dividual assignment. 

The operators in ALGOL can be provided with labels, for which 
use is made of any identifiers or unsigned integers (in the latter case 
prefixing of zero before the number does not alter the value of the 
label). However, in order to facilitate the construction of transla- 
tors (programs for translation from the ALGOL language to machine lan- 
guages) use of numbers as labels is frequently avoided. The label is 
separated from the operator by a colon. The operators (labeled or un- 
labeled) are arranged sequentially one after the other, separated from 
one another by a semicolon, for example: p: A; B; kl: C where A, B, C 
are operators, and p and kl are labels (the operator B is an unlabeled 
operator). The same operator can have not just a single, but as many 
labels as desired (separated from one another by colons), for example 
p:A:r7: A (here the operator A has three labels: p, A and r7). 

Usually the operators in ALGOL are performed sequentially, one 
after the other, in the order of their writing. Variation in the or- 
der of performance of the operators is accomplished by an operator 
termed the transfer operator. In tne simplest form the transfer ope- 
rator consists of the serivce words go to and some label L. The mean- 
ing of the action of this operator consists in that on coming to it 
a transfer (jump) is made to the operator having L as its label. 

In the general case in the transfer operator after the words go 
to there is placed some designational expression. The label is only 
one of the simplest examples of the designational expressions. A more 
complex example of the designational expression is the expression 
composed of two labels, say L and M, and some Boolean expression 

©: if © then L else M. The value of this designational expression is 
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equal to L if the condition C is satisfied, and to M otherwise. In 
place of the label M (but not in place of the label L) in this expres- 
sion there can be substituted any complex designational expression, and 
the similar substitution process can be continued. 

As a result there can arise complex recursive constructions for 
the transfer operator, for example goto if i =1 then L elseif i.=2 then M else P 
For simplification of this construction use is made of the so-called 
switch transfer. The switch consists of some identifier and a follow- 
ing so-called index expression enclosed in index (1.e., square) brack- 
ets. The index expression is any arithmetic expression which in the 
computation must every time be rounded off to the nearest integral 
value. 

If, for example, the switch identifier is s and the index expres- 
sion is the variable (expression) 1, then the switch transfer operator 
s[i] is written go to s[i]. In itself such an expression does not yet 
have any meaning. In order to give it meaning it is necessary, in ad- 
dition to the expression s[i], which we shall term the switch indica- 

- tor and consider as a simple designation expression, to also intro- 
duce the so-called switch description, usually placed together with 
the description of the types of variables, arrays and procedures 
(functions). The switch description begins with the service word 
switch, after which goes the switch identifier, then the assignment 
symbol : = and, finally, the so-called switch list, i.e., the list of 
designational expressions separated from one another by commas. For 
example: gwitch s; =L,M.P (where L, M, P are labels). 

On encountering the switch transfer operator, for example go to s{il, 
we compute the corresponding index expression, substituting in it the 
current values of the variables (say, 1 = 2). After this we turn to 
the description of the switch with the same identifier s and accon- 
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plish the transfer with respect to that designational expression in 
this description whose sequential number in the list coincides with 
the found value of the index expression. In the case considered there 
will be performed a transfer with respect to the label M, 1l.e., with 
respect to the second element of the switch list. 

If the value of the index expression in the switch indicator can- 
not be calculated (as a result of the fact that values have not yet 
been assigned to certain variables) or if this calculation leads to a 
number which is not a number of any element of the switch list, then 
the transfer operator is not performed and there is immediately per- 
formed the operator following it. In the example considered above the 
values of the index expressions equal to 4,0 or —1 do not lead to the 
objective. However, the values of the index expressions equal to 2.2 
or 2.7 lead (after their roundoff) to transfers with respect to the 
second or, correspondingly, with respect to the third element of the 
switch list (1.e., with respect to the labels M or P). 

A label, switch indicator or any designational expression en=— 
closed in round brackets is a simple designationai expression. From 
the two designational expressions A and B (of which the first is nec- 
essarily simple) and the Boolean expression C we can compose the com- 
plex designational expression if € then & else B, Which coincides with the 
expression A in the case of satisfaction of the condition C and with 
the expression B otherwise. Thus, for the designational expressions 
exactly the same recursive constructions are found to be possible as 
for the arithmetic (or Boolean) expressions. 

The third type of operator used in ALGOL 1s the so-called empty 
operator, which does not perform any operation and designated an empty 
set of symbols. Usually the empty operator is provided with a label 
and serves for the return using this label (as a result of the appli- 
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cation of the transfer operator) to the required segment of the pro- 
gram. The use of the empty operator with a label is absolutely neces-— 
sary, for example, in the case when it is necessary to perform a 
transition from the middle of a program to its end (not to the follow- 
ing nonempty program operator but precisely to the end of the pro- 
gram). Just as in the other operators, there must be a colon placed 
after the label in the empty operator. 

Of very great value in the construction of programs in the ALGOL 
language are the so-called cycle operators, whose meaning is that some 
operator (or group of operators) is performed some number of times in 
sequence. The cycle operator consists of the cycle heading and the 
operator itself (which can be any operator), which is performed mul- 
tiply in the cycling process. 

The cycle heading begins with the service word for and terminates 
with the service word do. After the word for there stands the identi- 
fier of that variable which changes in the process of the performance 
of the cycle. This variable is termed the cycle paramter. Following 
it, after the assignment symbol : =, there is the so-called cycle list, 
the listing of those values which the variable must take during the 
cycle operating time. The cycle list consists of one or several ele- 
ments of the cycle list, separated from one another by commas. In the 
simplest case the arithmetic expressions (in particular, simply num- 
bers) are used as the elements of the cycle list. For example, the cy- 
cle operator for i: =1,2,3 do ali]; =i42 performs the sequential assignments 
all]:=2 1; af2]:<4; of3]: #9 . The length of the cycle in this case is equal 
to 3. 

If the cycle parameter must take not three, but, say, a thousand 
different values, then the listing of all these values in the cycle 
list would be excessively cumbersome. In this case we use as the cy- 
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cle elements special constructions, each of which gives immediately 
some set of values of the cycle parameter. 

In ALGOL use is made of two types of such construction. The first 
type is constructed with the use of the service words step and until 
and has the form A step B until C, where A, B and C are arithmetic 
expressions. An element of the cycle list of this type gives the val- 
ues of the cycle parameter as follows: in the first step the cycle 
parameter is assigned the value of the arithmetic expression A, in 
the second step — the value of the arithmetic expression Ay = A + B, 
in the third — the value Ay = Ay + B etc., until the next value A, = 
= An-1 + B exceeds the value of the arithmetic expression c.* This 
value is not assigned to the cycle parameter and the cycle for it is 
not performed. The cycle list is considered to be exhausted and, con- 
sequently, there must be performed a transfer to the operator directly 
following the cycle operator. 

As an example of the construction described let us consider the 
cycle operator having the form for {:==12,4 step—1! until 0,—5 do aliJ:=i+10 . This 
operator performs the sequential assignment: a[12]: = 22; a[4]: = 14; 
alf3]: = 13; a[e]: = 12; afl]: = 11; a[O]: = 10; a[-5]: = 5. We note 
that in the first example the cycle list element 4 step—1 until 0 de- 
scribes an arithmetic progression (with a difference equal to minus 1), 
however in the general case the step represented by the arithmetic ex- 
pression B (standing after the word step) can be a variable, varying 
with every new repetition of the cycle. 

The second type of cycle list element is given with the aid of 
the arithmetic expression A, the Boolean expression B and the service 
word while, written in the sequence: A while B. This element provides 


the sequential assignment to the cycle parameter t of the values taken 
by the arithmetic expression A until the condition B ig satisfied 
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(i4,.e., until the expression B has the value "true"), If with a suc- 
ceeding performance of the cycle the condition B ceases to be satis- 
fied, then the cycle operator is not performed and there is accom— 
plished a transfer to the operator directly following it. We note that 
in the construction described it is forbidden to use the word step, 

so that the expression of the type & step Bwhile € is not encountered in 
ALGOL. 

With the method described above for the construction of the cy— 
cle operator, we can accomplish the repetition of only one single 
operator which follows immediately after the word do. If it is re- 
quired to repeat in a particular cycle not one single operator, but 
some sequence of operators Ay Ay re Avs then this sequence is en~ 
closed in special operator brackets, considering it after this as a 
single complex operator. 

As the operator brackets we make use of the pair of service words 
begin and end, so that the complex operator is written begin A;; A,; 
wit Ay end. The complex operators, just as the conventional, can be 
provided with labels (one or several). 

Along with the complex operators, in ALGOL use is made of the so- 
called blocks, differing from the complex operators in that ahead of 
the operators appearing in the block, directly after the word begin, 
there is placed a description of the types of certain quantities 
(identifiers) which are encountered in this block. In this case the 
quantities described in the block are localized only in the given 
block and, generally speaking, they lost their value (become indeter- 
minate) with departure from the block. If we wish to retain the value 
of certain of the quantities described in the block after departure 
from the block for purpose of using them on repeated reference t»% the 
block, then to the description of their types there 1s added the word 
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own. An example of the block :beginownreal.x; integer n; n: =s+i; x:matn end - 

We note that both the complex cperators and the blocks can in- 
clude in themselves other blocks and complex operators, permitting any 
recursive depth of such constructions. The identifiers used within the 
block for the designation of the improper quantities can be used out- 
side the block for the designation of any other quantities which are 
not accessible for this block (1i.e., which do not figure in the given 
block and are not subjected to any transformation in it). Identifiers 
which are not described in a block cannot be localized in it and, con- 
sequently, represent the same objects both inside the block and out- 
side of it. 

The labels are always assumed to be localized within the block in 
which they are encountered, so that entry into the block can be ac-— 
complished only through its origin. No transfer oprrator located outr 
side the block can accomplish transfer to any operator within this 
block. 

In exactly the same way it is not possible to accomplish a trans- 
fer with respect to a label located within a cycle operator with the 
aid of a transfer operator acting from outside the cycle. We note also 
that with exit from the cycle operator as a result of exhaustion of 
the cycle list the value of the cycle parameter is considered inde- 
terminate. If, however, the exit from the cycle is accomplished as a 
result of the transfer operator contained in the composition of the 
operator (or block) which is repeated in the given cycle (i.e., stand- 
ing after the word do) then the value of the cycle parameter is re- 
tained just as it was immediately before the performance of the trans-— 
fer operator. 

All the simple operators described above and also the procedure 
operator described below, and all the complex operators and blocks to 
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- system of so-called unconditional operators. In ALGOL there are 
introduced two other types of conditional operators, using the condi- 
tion if Bthen (where B is a Boolean expression) which is analogous to 
the condition used in the construction of the comple: arithmetic, 
Boolean and designational expressions. 

The operator "if" is constructed from the described condition 
and the following unconditional operator which is performed in the 
case when the condition is satisfied, and is bypassed (not performed) 
otherwise, Example: if a> Vthen begin A: =n; goto Lend - The complex ope- 
rator begin A: =ngotoLend is performed if and only if the condition 
a>b is satisfied. 

The conditional operator proper is obtained by the addition to 
the operator "if" the service word else and the following arbitrary 
operator (possibly also conditional). This operator must be performed 
in the case when the condition in the "if" operator is not satisfied. 
The general structure of the conditional operator thus has the form 

if B then A, else As, 
where B is any Eoolean expression, Ay is an unconditional operator, 
Ay is any operator. 

The eo-called procedure operators are of essential importance in 
the construction of ALGOL. Procedure is the term given to some en- 
semble of operators designated by some identifier, termed the proce- 
dure identifier. In ALGOL the procedures play the same role as the 
subroutines in conventional programming, permitting the acceleration 
of compilation of complex programs by means of the use of precompiled 
standard programs. Decoding of the procedure (actual writing of the 
operators composing it) can be performed either in the ALGOL language 
or directly in the language of the corresponding universal digital 


machine. 
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The procedure operator (if we are not considering the familiar 
standard procedures) must be described in advance. This description 
is accomplished with the aid of the service word procedure, after 
which there follows the so-called procedure heading, i.e , the proce- 
dure identifier, and after it (in round brackets) a list of the so- 
called formal parameters of the procedure, i.e., the identifiers sep- 
arated from one another by special limiters, specifically commas, or 
by limiters of the form) letter line: (. For example, procedure sin 
(x) or procedure A (x, y) pressure: (p). The first procedure has one 
formal parameter (x), and the second has three formal parameters x, y, 
p. Procedures without parameters are also possible. Their heading con- 
sist only or procedure identifiers, not accompanied by following 
brackets. The procedure itself (the so-called body of the procedure ) 
is written out after the procedure heading in the form of some opera- 
tor. 

The procedure operator itself, or, more exactly, the procedure 
derivation operator, is written in the same form as in the procedure 
description, but now without the word procedure ahead of it and under 
the condition that the formal parameters of the procedure are replaced 
by its so-called actual parameters. The brackets and limiters are the 
same as in the sescription of the corresponding procedure. The per 
formance of the procedure operator consists in the assignment of all 
the formal parameters of the values of the corresponding actual para- 
meters, or replacement of the formal parameters by actual and subse-- 
quent performance of the procedure. 

As the actual parameters use can be made of any expressions 
(arithmetic, Boolean or designational), array identifiers and switch 
identifiers, identifiers of any procedures and, finaliy, the so-called 
lines. 


- 400 - 


The lines are any sequences of symbols enclosed in special "line" 
brackets .- . These brackets can also be used within a line. Example: 
10 + —(Ix4 tO AVD'wAd' ~. The lines can be used as the actual para- 
meters only of those procedures which are written in machine codes, 
and not in the ALGOL language. Most frequently their use is linited 
to the special procedure puncn (x) which performs the printing or per- 
forating of the actual parameters, which are represented in place of 
the formal parameter x. If, in particular, in place x there is sub- 
stituted some line, then the procedure punch performs the extraction 
of all the symbols of this line from the machine for printing or per- 
forating. Therefore the line can contain not only the "analog" but anv 
other symbols which the considered printing or perforating device is 
capable of realizing. 

In the procedure description there is also indicated the type of 
its formal parameters. With the substitution of the actual narameters 
their types must coincide with the types of the corresponding formal 
parameters. In order to avoid ambiguity in such a substitution, it is 
usually necessary to perform a replacement of those identifiers locai- 
ized within the procedure which coincide with the identifiers occur- 
ring in the actual parameters being substituted. 

We note that among the procedure parameters there appear, gener- 
ally speaking, both the input and output (obtained as a result of the 
performance of the procedure) quantitites of this procedure. If as a 
result of the performance of the procedure there is obtained only one 
quantity (number or logical value) then it is natural to denote this 
quantity by the identifier of its procedure (together with the line 
of actual parameters). In this case the corresponding procedure is 
termed a function (see above) and in its description there is placed 
ahead of the word procedure the word designating the type of output 

- 401 - 


ri! 


quantity of this procedure. Example: real procedure sin (x). It happens 
frequently that in the procedure some formal parameter x participates 
in several transformations. For example, the parameter x in the proce- 
dure sin(x) with the calculation of the sine using a series is raised 


sequentially to the powers 3, 5, 7 etc. If in the substitution this 


= 


parameter is replaced by a quite complex expression (actual parameter), 
say, x: = (a + b) x (a— b), then in the development of the procedure 
we can encounter the necessity for the repeated computation of this 


expression in every case when a particular operation is performed with 





the parameter x. Naturally it is simpler to compute the value of x 
ahead of time (prior to entry into the procedure) and substitute this 
value in place of it. 

In the automatic translation of a program from the ALGOL language 
-to machine language, it is necessary every time to communicate to the 
translator (programming program) which parameter values must of neces- 
sity be computed prior to substitution into the procedure. All such 
parameters in the description are labeled with the special service 
word value and are placed after the ensemble of formal parameters of 
the procedure heading before the description of their types (the so- 
called specifications). For example, in place of the description real 
x; integer n there may appear the description vyaluen; real x; integer n. 

We shall usually supplement ALGOL with two procedures which are 
not defined in the descriptions for the entry and output of informa- 
tion from the machine. The first procedure is always assigned the same 
identifier read, and the second is assigned the identifier punch; the 
actual parameters of each of these procedures will be considered ei- 
ther some quantities of the type real, integer or Boolean, or the ar- 
ray identifier of any of these types. 

We shall make some other remarks. Normally every program is ALGOL 

- 402 - 


= EE SS Bn ie a TT 


n" ye ao es emma ~ id > 


ls represented in the form of a block, i.e., is enclosed in the state~ 
ment brackets begin — end. To facilitate the reading of the "analog" 
programs there can be introduced into them the so-called commentaries, 
1.e., clarifications for the programmer, which have no intrinsic 
meaning in the ALGOL language and which are therefore not accepted by 
the translator in automatic programming. 

The commentary is considered to be every sequence of symbols (not 
necessarily "analog") beginning with the service word comment after 
a semicolon or the word begin, terminating with a semicolon and not 
containing within itself other occurrences of a semicolon. Any se- 
quence of symbols following after the word end to a semicolon or to 
the end of the program is also considered a commentary if it does not 
contain the words end and else, or a semicolon. For example, in the 
expressions comment text; begin comment text; end text; the word "text" 
is a commentary. From the point of view of the "analog" programs the 
first expression is equivalent to an empty place, the second — to the 
word begin, and the third — to the word end. 

For the electronic digital machines with small and medium capac- 
ity the ALGOL-60 language is excessively complex to permit organizing 
effective translation from it to the machine language. Therefore there 
has been proposed the simplified variant of ALGOL which has been 
termed SMOLGOL-61. * 

The simplification amounts to the following. First, the alphabet 
is limited to either only lower-case or only capital letters of the 
Latin alphabet. Second, we exclude from consideration the logical 
operations of impl‘cation and equivalence, and also the service words 
while, Boolean, true and false. Thus, the use of the logical values 
"true" and "false" is not permitted. The use of the identifiers for 
the designation of the logical quantities is also prohibited. The Boo- 

= 403 = 


- ee eg oe 


ey vere 


lean variables can be introduced by the programmer irdirectly, with 
the aid of the replacement of the logical values "true" and "false" 
by the whole numbers 1 and O. The use of Boolean expressions is per 
mitted only in conditions. If in ALGOL the value of some Boolean ex- 
pression B was assigned some identifier P-—-P: = B, then in SMOLGOL 
there must correspond to it the assignment of the form P; = if® then! else0, 
where in the right side there now stands an arithmetic expression : - 
rather than a Boolean expression. 

Further, the length of the identifiers is limeted to five letters. 
More exactly, identifiers in which the first five letters coincide are 
considered identical in SMOLGOL. In the arithmetic expression a f¢ b 
negative values for the exponent b are not permitted in the integer 
type quantities a and b. Whole numbers are not used as labels. The 
step in the cycle operator must either remain positive at all times, 
and in the latter case the symbol "minus" must be placed explicitly 
ahead of the expression which specifies the step. In cycle list there . 
must be only one step-until element. 

In all the procedures, except the input and output procedures, s 
use cannot be made of lines as actual parameters. No procedure can be 
called on before it has been described. The p.ssibility of using one 
procedure within another is excluded if they were described in the 
same block. A second callup of the same procedure is forbidden until 
its first call has been completely terminated. For example, use can- 
not be made of recursive calls of the procedure F(u, v) of the form 
F(x, F(x, y)). But repeated use of a procedure after its termination 
is not prohibited, so that the expression ln(1n x) is completely ac- 
ceptable in SMOLGOL. If the procedure P is an actual parameter of an- : 
other procedure, then all the parameters of the procedure P must be 


described as value. 
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Standard procedures for the finding of the sign and absolute 
value of a number cannot be used as actual parameters in any proce- 
dures. If it is necessary to use them in this fashion, then they must 
first be described as functions, i.e., write out the expression real procedure 

abs(x); value x; featx; begin abs: = abs(x) end. (and similarly for integer proce- 
dure sign (x)). Neither procedures nor their formal parameters can te 
of the Boolean type. 

The variables themselves cannot relate to portions of the pro- 
gram outside of the block in which they are defined. The boundaries of 
arrays must be constant. The elements of the switch lists in the 
switch descriptions can be only labels and not any designational ex- 
pressions. 

The descriptions of the procedures which are called in any block 
must be accomplished after the description of the types, switches and 
arrays of the corresponding block. The procedure identifiers can ap- 
pear within a procedure only in the case when they are the left parts 
of the corresponding assignment operators. Some other limitations also 


exist. 
We note that any program written in SMOLGOL can also be consid- 


ered as an "algol" program. Generally speaking, the reverse is not 
true. 
§5. EXAMPLES OF PROGRAMMING USING ALGOL-60. 

Let us consider first a very simple example which has already 
been used in §3 of the present chapter as an illustration of the prin- 
ciples of programming in machine languages. This is the calculation 
of the value of the sun St The corresponding "algol" program can 


be written in the form 
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begin real s; integer m, &; 
read (m); s:= 0, 
for k: = I step |until m do 
8: me 8 + 1/Rf2; 

punch (s) 

The procedures read(m) and punch(s) respectively provide for the 
entry of the required information (upper limit of the summation) and 
the output of the result, i.e., the value of the desired sum. It is 
easy to see that the program written in ALGOL is far more visible and 
understandable than the machine program written in §3 which solves 
the same problem. 

Programs for the other examples considered in §3 can also be 
written quite lucidly and clearly. The computation of the scalar pro- 
duct of two (real) vectors (al, a2,..., an) and (bl, b2, ..., bn) 1s 


represented in the form 
begin integer n;- read (nj; 
begin real s; integer ¢; real array a[!:n}, 6 (1:2); 
read (a); retid (0); s: = 0: 
for {:= step 1 until ndo 8, 
asta (x [i 
punch (8) 
end end. 


Multiplication of the vector (bl, b2, ..., bn) by the matrix 


lA, l| can be represented in ALGOL by the program 

begin ‘Wateger A; read (n), 

begin integer i, & real array s{1: nj, b[l:nh All:a, Iiak 
read (6); read (A); 
for k: = | step 1 until n de 
begin s {A}: = 0; 
for ij: = 1 step 1 until n do 
s{h}: = s[k)-+ b[i} x A [i, a}; 
end 7 
punch (s); 

In this program one cycle operator occurs in another. Internal 


operator brackets are introduced, since in the first (outer) cycle it 
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is necessary to perform a sequence consisting of two operators. 

The two phrases written at the end of the program are a commen~ 
tary which does not actually enter into the program and is not ac- 
cepted. by the translator (programming program). 

We note that the ALGOL language is a universal algorithmic lan- 
guage and is therefore suitable for the writing of any algorithms. In 
addition, as was noted above, all the programs written in ALGOL can be 
realized (under the condition of the use of a sufficiently large mem- 
ory volume ) by any universal electronic digital machine. 

We shall make use of the last circumstance to illustrate that 
the universal elecrtonic digital machines can perform not only the 
conventional algorithms, but also algorithms with random transfers 
and any self-organizing systems of algorithms. 

In order to have the possibility of constructing in ALGOL any 
desired random algorithms it is sufficient to introduce into it a 
special procedure which we shall designate as random (a, b). With cach 
referral to this procedure it generates some random number belonging 
to the segment [a, b]. Here it is assumed that the selection is made 
on the basis of a uniform distribution law according to which all the 
numbers of the indicated segment are considered equally probable. The 
random numbers themselves are assumed to be of the integer or real 
type depending on what type is assigned to the formal parameters (seg- 
ment bounds) a, b. Of course both these parameters must be of the same 
type. 

The method of construction of the procedure itself can vary over 
quite wide limits. We can, for example, simply write into the machine 
memory a table of random numbers and construct the procedure for their 
sequential selection. In many cases a special random number unit is 
appended to the electronic digital machine. In this case the procedure 
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consists in the selection of the numbers generated by the indicated 
unit and their subsequent transformation for the purpose of reduction 
to the given interval [a, b]. 

Wide use is also made of the various procedures which generate 
sequences of the so-called pseudorandom numbers. For the formation o. 
such a sequence we can make use, for example, of the following tech- 
nique: some positive number ay is selected and squared. In resulting 
number a5 = ae there is selected some group of digits (usually not the 
highest or lowest). The number by formed by these digits is taken as 
the first pseudorandom number. Squaring the number Dos we obtain the 
new number &3 @ v5 which we treat just as we did the number Bo. Con- 
tinuing this process we obtain the required sequence of pseudorandom 
numbers. 

The sequence constructed in this fashion, if its length is not 
too great, can be considered practically random. However, with a long 
sequence there occur various sorts of cyclings (cyclic repetitions of f 
previously encountered pieces of the sequence) which is what differ-— 
entiates the pseudorandom sequences from the purely random. However, ‘ 
for each concrete case there can be selected that procedure for the 
generation of the pseudorandom sequence which form the purely random 
sequences. 

With the aid of the indicated procedures the problem is complete- 
ly resolved of the realization on the universal electronic digital 
machines of any random algorithms. The problem of the realization of 
the self-organizing systems of algorithms on the machines is actually 
even simpler, since in this case, as a rule, we do not have to resort 
to any special procedures. Such a realization is accomplished by the ~ 
usual methods, with the aid of programs written in the ALGOL language. 


We shall present examples of the sort of self-organizing systems de~ 
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scribed in the preceding chapter. 

As the first example let us consider the self-adaptive control 
algorithm based on the method of steepest descent. The task of this 
algorithm is the generation of those arrays A[1: n] of numbers (con- 
trol actions) such that the criterion f will have the smallest possible L 
value. The criterion f is a known function of certain parameters (in- 
dications) of instruments which monitor the process whose values in 
the form of the corresponding array B[1: m] are periodically intro- 
duced into the algorithn. 

The parameters composing the array B (control results) vary as 
a result of the varia*+ion of the controlling actions (array A) and al- 
s0 as a result of other factors relating to the controlled process and 
which do not depend on the control algorithm. The factors in question 
here reduce to the variation of certain uncontrolled parameters, where 
the nature of this variation is not known ahead of time to the controle 
ling algorithm. It is easy to see that the described control algorithm 
accomplishes extremal regulation (with respect to the criterion f) 
whose quality will be better the smaller the ratio of the time for 
the algorithm to determine the optimal controlling actions (array A) 
to the average time in the course of which there occurs a sensible 
variation of the uncontrollable parameters. It is not difficult to 
verify that the control algorithm in question can be described in the 
ALGOL language by the following program: 


begin integer i, j, & real y, 5; 
Bil: mj), Pl: np 
L: read (B); y: = {(B); s: = 0; 
2 for i: = I step 1 until n do 


begin Ald}: = Ali] + d; 
punch (A); read (B); 
Piil: = f (B) — y)/d: 

8: = 8+ P[i}t2, 
Alt}: = Afi)—d 

end; 5 


r: = d/sort (8); 

for j:= 1 step | untilndo * 
A(i}: =r x Pi] + Ali; 

punch (A), read (B); 

if aBs (y—f(B)) >! then go to L else 
lor k:=1 step 1 until n do 
Ali: = Altl—r x Phi 
“punch (A); 
go to L 


In the construction of the program the quantity d is the steepest 
descent step and the quantity ¢@ is the accuracy of achieving the min- 
imal value of the criterion f. The function sqrt(s) is equal to the 
square root of s taken with a plus sign. The array P[1: n] gives the 
relative magnitudes of the optimal increments of the values of the 
control actions A[l: n] at each step of the steepest descent process. 
It is assumed that the quantities d, 4, the real procedures f(B) and 
sqrt(s) and the initial values of the components of the array A[1: n] 
were introduced into the algorithm previously (prior to the instruce 
tion with the label L). 

Now let us consider the algorithm with performs the operation of 
a discrete a-perceptron P. Let us assume that the perceptron P has a 
retina consisting of N receptors and is designed for the recognition 
of two patterns. The A-elements are (1, 1, 1)=<neurons, the reward con- . 
stant is equal to unity, and the penalty constant 1s ecual to zero. 
The hbnage projected onto the retina is the Boolean array r[1: N], 
which is read externally with the showing to the perceptron of each 
new image. Also sensed extrenally is the Boolean quantity a which 1s 
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the applied signal on the correctness (truth) or incorrectness (falsi- 

ty) of the response p given by the perceptron. The response p is simply 

the number of the pattern to which the perceptron assigns each image ? 
shown to it. We use sf and ss to designate the output signals of the 
summnators of the first and second patterns. 

Let us assume further that the number of neurons of both the first 
and second image is equal to n. We use xf[1i] and yf[i] to denote the 
numbers of the retina receptors to which there are connected respec- 
tively the exciting and inhibiting inputs of the the ith neuron of the 
first pattern, we use xs[i] and ys[i] to denote the corresponding num~ 
bers of the receptors for the ith neuron of the second pattern, and 
vf[i] and vs[i] to denote the weights of the ith neurons of the first 
and second patterns (1 = 1, 2, ..., n). With these assumptions, the 
algorithm which performs the work of the perceptron P (in the learning 


regime) can be written in the form 
wep integer p, t, j, #: real sf, ss; Boolean a, 
sate array 7{1:N), 
L: read (r); sf: = 0; ss:.= 0; 
for i: = 1 step 1 until n do 
begin ifr (xfli]] A ir yf (él) then 
sf: = sf + of (th 
if {xs{il A ‘7 [yslé}| then 
$$: == ss + us [i] 
end, ‘ 
if sf>ss then p: = 1; : 
Wt sf <ss then p: = 2; 
it sf mss then go to L; | 
punch (p); read (a); | 
ita A (p = 1) then for j: = 1 step | until n do 
rx (AAW 7r Lyf Lill then of i= oF LA + 1: : 
~ if aA(p=2) then for k: = 1 step 1 until 2 do 
it r[xs[kJJA i riysik)) then os[k}: = usik) + 1; 
go to L ’ 
end, 


It is assumed that the arrays xfll:nl, pfll:nl, xsll:al, ysll: al, off: nl mas: nj 
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are introduced into the program ahead of time and that the last two 
arrays (weights of the neurons of the first and second patterns) do 
nto consists of only zeros. Otherwise the pattern summators would gene 
erute continuously signals (sf and ss) which are equal to zero and, 
since it is assumed in the program that with equal signals of the sw.— 
mators the perceptron will not generate any signal p, the learning 
process (and, in general, any variation of the weights) would not 
exist. 

This last limitation can be avoided if the teacher does not sin- 
ply supply a reward signal but communicates vo the perceptron the true 
number of the pattern to which the image being shown to the perceptron 
belongs. This is precisely the method of functioning of the perceptron 
in the learning regime which was considered in the preceding chapter. 
Let us indicate the changes in the program described above which must 
be made with application to the new type of signal a. 

In the description the quantity a must be declared as a quant:ty 
of the integer type and not Boolean. The program changes can be re- 
duced to the following. After the operator ifsf<ss then p: =2 in place of . 
the operator jfxf =ssthengotoL it is necessary to use the operator 
If sf % ss then punch(p) . Then in the conditional operators following after 
the operator read(a), the conditions ita A =l), ifa A (gp =2) must be re- 
placed by the conditions if a = 1 and if a = 2 respectively. 

It is also not difficult to describe the changes in the original 
perceptron progvam which must be made in order to simulate the per- 
ceptron self-learning regime rather than the learning regime. To do 
this the quantity a is completely excluded from the program together 
with the corresponding operator read(a). In the following conditional . 
operators there must be added to the conditions standing after the 
service words do the terms A (sf >3s) and A (sf<ss) respectively. 
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Thus, in both the learning and self-learning regimes the a-per= 
ceptrons can be easily simulated in the ALGOL language and consequent- 
ly can be simulated on the universal electronic digital machines. It 
is easy to see that the same holds for any modifications and generali- 
zations of the perceptron circuit. 

Let us now describe using the ALGOL language still another self- 
organizing algorithmic system which simulates the process of biologi- 
cal evolution and the formation of new species. The modeling of such 
a system (although somewhat different) onthe universal electronic dig- 
ital machine has been accomplished by Letichevskiy [49]. 

Iet us consider a discrete space consisting of a finite set of 
points with the numbers from 1 to n inclusive. Let us assume for sim- 
plicity that this set is cyclically ordered. In other words, for each 
point i we define the two points neighboring with it — the point di- 
rectly preceding it bi and the point directly following it fi. If i # 
1, then bi = i1-— 1; for 1 = 1 we set bie=n. Similarly, if i #n, then 
fi =i+i1, and for i =n we set fi = 1. 

To every point 1 of the space we assign some state s[i] which can 
take any integral value from 0 to k inclusive. If s[1i] = O the corre- 
sponding point is considered "lifeless." If, however, s[i] # 0, then 
we assume that at the point 1 there 1s some "living being” in the 
state s[i]. As the such "living beings" in the considered model we se- 
lect abstract automata with the same number of states (equal to k) 
but, generally speaking, with different transfer and outrut tables. 

In addition, for each point 1 of our space there is given the 
number F[i], equal to 1 or O in accordance with whether or not ther is 
"food" at the ith point. In the case of the existence of "food" at a 
particular point its supply is assumed so large (or self-replenishing) 
that the automaton located at this same point will practically not al- 
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ter the supply in the course of its "feeding." The array F is altered 
at each successive cycle of operation of the algorithm in accordance 
with some "law of nature" which we shall assume to be located outside 
of our algorithm. 

In addition to the state s[i] itself of the automaton occupying 
the point i, with this automaton we associate also two other numbers, 
namely the indications of its "life" counter L[1i] and the so-called 
"hunger" counter H[i]. The quantity L[1] increases by unity at each 
cycle of operation of the algorithm, and after its value exceeds some 
prespecified threshold 4, the corresponding automaton transfers into 
the zero state, i.e., simply speaking, it is destroyed (simulating 
thereby natural death). 

The quantity H[i] increases by unity if F[i] = 0 (1.e., in the 
case when the automaton is located at a point of the space without 
"food") and decreases by unity in the opposite case, without, however, 
taking negative values (in the case when H[i] = 0 we set H[i] — 1 = 0. 
When the quantity H[1] exceeds same level h which is fixed in advance, 
the corresponding automaton transitions into the zero state (thereby 
simulating death from hunger). 

The input signals of the automaton located at the point 1 are the 
states s[{bi] and s[fi] of the neighboring points, and also the signals 
F(bi], F[i], F[fi] on the presence or absence of food, both at the 
point i itself and at the neighboring points. The output signal m is 
the so-called motion of the automaton, i.e., in other words, the in- 
creme:it of the number of the spatial point occupied by the automaton 
in the given automaton operating cycle. We shall consider that the 
quantity m can take only three different values: 0, 1 and —1l. 

In view of the presence of five input channels, the switching and 


output tables of each automaton can be specified in the form of six- 
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dimensional arrays. For the specification of th switching and output 
tables of all the automata at the same time we make use of the seven— 
dimensional arrays SP[1l: n, l:k; O:k, O:k, O:1, O:1, 0:1] and M[1l:n, 
l:k, Osk, Osk, O:1, O:1, 0:1] respectively. The first index in each of 
these arrays indicates the number of the cell 1 occupied by the autom- 
aton, the second indicates the state s[i] of this automaton, the third 
and fourth indicate the states of the neighboring points s[bi] and 
s([fi], the fifth, sixth and seventh indices are the signals F[i], 
F[bi] and F[fi] on the presence or absence of "food" at the point it- 
self and at the neighboring points. For the motion, given by the out- 
,ut table M, we shall not introduce any limitations in the table it- 
self, however the performance of the corresponding motion will be ac-— 
complished only in the case when the point to which the automaton is 
shifted is not occupied by any other automaton. 

The variation of the indications of the "life" and "hunger" 
counters is accomplished after the performance of the motion and the 
transfer of the automaton into the new state. If in this case there 
does not occur "death" of the automaton, and its motion is nontrivial 
(i.e., the automaton does not remain at the previous location), then 
with fulfillment of certain additional conditions there takes place 
"reproduction" of the automaton by means of fission. In this case the 
shifted automaton A completely retains its structure with the excep- 
tion of the fact that on its "life" counter there is established a 
value equal to zero. And at the place occupied by the automaton A prior 
to this there appears its "double," differing from A only in that in 
each of the two arrays which specify the transitions and outputs of 
the automaton A, one number (respectively the new state or the motion 
of the automaton) is replaced by a random number. The "life" counter 
of the new automaton is also set to zero. 
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Additional conditions for the possibility of reproduction, which we 
discussed above, are that the indications of the "life" counter be 
included between two a priori fixed numbers @f and fu, while the in- 
dication of the "hunger" counter does not exceed some number hu, also 
fixed ahead of time. 

For the formation of the random numbers we fix the speciai inte- 
gral procedure random(a,6), which delivers at each call some integral 
number located on the closed segment [a, b]. In this case all the 
whole numbers of the indicated segment are considered equally probable. 
The methods of construction of such procedures were described above. 

The algorithm which we have described in one operating cycle must 
perform a scan of all the points of our space, performing at these 
points the changes listed above. After finishing each cycle the algo- 
rithm must read through all the new values of all the components of 
the array F{l:n] and begin the performance of the following cycle. We 
shall accomplish the count of the number of cycles with the aid of 
the special quantity t. When this quantity reaches the value p which 
is fixed in advance the algorithm must terminate its operation. 

To facilitate the programming of the described algorithm in the 
ALGOL language, we introduce three blocks which describe the process 
of the movement of the automaton located at the ith point, the process 
of its "death" and the process of its "reproduction." For brevity let 
us denote these blocks by B; B, and B3 respectively, and we write 
for each of them the corresponding program in ALGOL. 

The block By: 
begin integer m, j, bs, fs, f. of. fh 

m: = Mii, s(t s(bi), s (fil. Fld, Fon F (fi 
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pi: = ifi+tm>n then | else i+; 
if s(pi] #0 then pl: =¢ comment pi is the number of 
the point to which the considered automaton is displaced; 
L{pt): = Li) + 1; 
H [pi]: = if F (pt) = 0 then Afi)+ 1 else if H{[i] 0 
then H [i]—1 else 0; 
if pi i then 
for /:=1 step | until & do 
for bs: = 0 step | until & do. 
for fs: = 0 step | until & do 
for f: = 0,1 do for bf = 0,1 do for ff:= 0.1 do 
begin SP [pi, j, bs, fs, f, bf, ffl: = SP li, j, bs, fs, f, Of. ffs 
M{pi. j, bs, fs, f, Of. ffl: = Mi, j, bs, fs, f, of. ffl 
end; 
: $({pi}:==SPi,s (i), s[bi), stfi}, Fé), F (bi), F (fil) 
end. | 
The block By accomplishes the displacement of the automaton from 


the point 1 to the point pi, its translation into the new state (de- 
fined by the situation at the moment the automaton is located at the 
point 1), the change of the indications of the "life" and "hunger" 
counters and the rewriting of the arrays which specify the switching 
and output functions of the automata, with the objective of bringing 
them into correspondence with the new location of the considered au- 
tomaton. The values of 1 and pi are retained with departure from the 
block. 

The block By is very simple: begin slpil: =Oend. - 

We note that the program will be constructed so that the values 
of L[pi] and H[pi] at the point pi which are retained after death of 


the automaton, and the values of the corresponding components of the 


arrays SP and M cannot lead to errors in the furture. This is achieved 


as the result of the fact that with repetition of the program the 


listed quantities, before being used, are defined anew, since they 
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assignment operators. 


The "reproduction" block B,: 


begin integer rj, rbs, rfs, rf, rbf, rif, j, os, fs, f. of. fh 
Boolean 8; 
rj: = random (1, k); 
rbs: = random (0. &); 
rfs: = random (0, &); 
rf: = random (6, 1), 
rbf: = random (0, 1); 
rif:= random (0,1), 
for j: = 1 step 1 until & de 
for bs: = 0 step | until & do 
for fs: = 0 step | until & do 
for f: = 0,1 do for bf: = 0,1 do for ff: = 0.1 do 
begin B: = j= rj Abs mrbsAfs=rfsAf=rf Abi = rot Aff =i 
SP |i, j, bs, fs, f, bf, ff}: = if B then random (1, &) 
; else SP (pi, j, bs, fs, f, of. ffi: 
M(i, j. bs, fs, f, bf, ffl: = 1B’ then random (— 1, 1) 


ao else M(pt, j, bs, fs, f, of, ffl 


end. 
The entire program for the modeling of the evolution process is 


now represented as follows: 


begin integer ¢. i, bi. fl. p. 9. pli 
integer array F (I: a}, S(i:a), Lil:a), Aft: al. 
SP[l:a, 1:k, Ok, Ok, O:1, O:1, O:1), Ml: a, 1:2’, O:h 
0:k, 0:1, O:1, O21}; 
for g=1 step | until n de 
Lig: =H (gO. 
t: = 0; é = 1; read (S); read (SP); read (M); 
Q: read (FY; fi: = it ian then {+1 else 1; 
if Si] = © then begin i: = fi; ge to P end; 
bi: = if i 1 then i—1 else n; 
| vice Bs; 
if Lip) >tV H[pi)>h then 
| block Bs}; 
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if pi #iAH [pi] < huAL{pi}<luAL (pl) > Ul then 


| biock Bs |; ; 
it pi + fi then i: = fi else i: = if fin then fi+! else 1; e 
P:it i= 1\V/ pi =1 then t: =¢+ 1; if tp thengo to Q 4 


end. 
We note that the program which we have constructed is not eco- 


nomical from the point of view of the use of the memory and the ne- 
cessity for rewriting of the multi-dimensional arrays. We can achieve 
a far more economical program construction if we introduce numeration 
of the auromata and use in the arrays L, H, SP and M the number of 
the corresponding automaton in place of the number of the spatial 
point. 

In the real modeling of the evolutionary process on a universal 
electronic digital machine in [43], use was made of a program with 
more limited capabilities, nevertheless, the experiments conducted 
showed that even with these conditions the quality of the simulation 
was quite satisfactory. For relatively simple "laws of nature" the 
process of adaptation of the automata to the surrounding medium and 
the formation of stable "species" were observed after several tens of 
thousand of cycles of operation of the algorithm and the replacement 
of the corresponding number of "generations." Initially the transition 
and output tables of the automata (arrays M and SP in our case) were 
specified arbitrarily. In the evolution process there took place a 
"dying out" of the poorly arranged automata and the appearance of 


forms which were better adapted for "life" under the given conditions. 


Manu- 

pane [Footnotes ] 

No..: 

361 The diode matrices are two systems of conductors, usually 


termed buses, a part of which is interconnected by diodes, 
1.e., elements which pass current in only one direction. 


= hio'- 


384 An attempt is usually made to avoid such roundoffs in ALGOL, 
since natural roundoff of the quantity 23.5 in some machines 
leads to 23, and in others to 24, 


396 If the step B is negative, then the value of the expression 
C- Ais takén with reversed sign. 


403 For a description of the SMOLGOL-61 language see: Communice- 
tions of the Assoc. for Comp. Mach., 1961, Vol. 4, No. 11, 
pages 499-502. 
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Chapter 6 we 


PREDICATE CALCULUS AND THE PROBLEM OF AUTOMATION OF 
THE SCIENTIFIC CREATIVE PROCESSES 


§1. BASIC CONCEPTS OF PREDICATE CALCULUS 

As we mentioned in Chapter 2, the simples component part of 
mathematical logic — propositional calculus — does not really pene- 
trate into the structure of the elementary propositions, thereby 
limiting its capabilities in the formalization of the more complex 
thought processes. The next higher stage of mathematical logic with 
regard to complexity, termed restricted predicate calculus or first 
degree predicate calculus, posseses far stronger expressive capabil- 
ities. 

One characteristic feature of predicate calculus is, first of 
all, that along with the variable propositions which can take only 
two possible values ("true" and "false") there are introduced into 
consideration the so-called object variables which run through some, 
generally speaking, infinite region of values, which is customarily 
termed the object region. The values composing this region are usually 
termed objects. 

Fixing a particular object region, we obtain the possibility of 
constructing the propositional functions of the object variables, usu- 
ally termed predicates: the n-place predicate P(x,; Xoo sees x.) is a 
variable proposition whose truth or falsity is determined by sets of 


values of the object variables Xy9 Xoo sees Xp If the predicate P is 


n 
not identically true or identically false, then on some sets of val- 
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ues of the object variables it takes the value "true" and on others - 
the value "false." 

In the classical theory of predicates only the single-place pred- 
icates were called predicates (or properties). For the multiplace 
predicates the special term "relation" was used: the two-place pred~ 
icates were termed binary relations the three-place were termed ter— 
nary relations, etc. For our purposes there is no need for special 
emphasis of this difference, therefore we shall term predicates any 
functions of any (greater than zero) number of object variables. 

The use of predicates permits the construction of a formal lan- 
guage analogous to propositional calculus but, in contrast with it, 
penetrating into the structure of the elementary propositions. For ex- 
ample, the proposition "four is larger than two" is indecomposable in 
propositional calculus. However, if we introduce the predicate P(x, y) 
with the set of whole nonnegative numbers as the object region, true 
if and only if the inequality x > y is satisfied, then this proposi- 
tion is written in the form P(4, 2), which now gives an idea of the 
internal structure of the proposition. 

The internal structure of the proposition “oxygen is a gas" can 
be revealed in exactly the same way. To do this it is sufficient to 
introduce the predicate "is a gas" which takes the value "true if and 
‘only if there is substituted in it an object of the object region 
which actually is a gas. If we designate this predicate by Q(x), then 
the phrase which we presented can be written in the form Q(oxygen). 

In the formal construction of predicate calculus we are not usu- 
ally interested in the exact objects from which a particular object 
region is constituted, it is sufficient to know only the number of all 
these objects or, expressing it more precisely, the power of the set 
of all the objects composing the object region. If the object region 
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is finite or countable the objects composing it can be replaced by 
their numbers. Thereby the object regions are reduced to number sets, 
which facilitates the problem of concrete expression of the correspond= 
ing predicates. Since the predicates are variable propositions, all 
the operations used in the second chapter in the construction of the 
propositional calculus can be used with them. At the same time the use 
of the object variables permits the introduction of several new cpe-= 
rations which are specific for the predicate calculus. The constric- 
tion of these operations is accomplished with the aid of the so-called 
quantifiers. Usually we limit ourselves to only two forms of quantifi- 
ers, termed existensional quantifiers and generality quantifiers. For 
their designation we shall use the symbols 39x and Yvx respectively, 
where x indicates the variable on which the quantifier acts. 

The expression JxP(x) is the conventional designation for the 
proposition "three exists that object x for which the predicate P is 
true." Similarly the expression yxP(x) designates the propos‘tion "for 
all objects x the predicate P is true." Here it is understood that 
the objects under discussion belong to the particular fixed object 
region M. If the region M consists of the finite number of objects 
Xy9 Xoo sees Xys the expression JxP(x) reduces to the disjunction 
P(x,) V P(x) V ..- V Plee), and the expression yxpix) reduces to the conjunc-— 
tion P(x) A P(x) A... A Pl) . In the case of an infinite object region 
this reduction is not possible, since the constructive nature of our 
constructions excludes the possibility of the use of infinite dis junc- 
tions and conjunctions. 

In the nonconstructive (the so-called set-theoretic) approach to 
the construction of predicate calculus, we can always picture the ex- 
pressions gJxP(x) and yxP(x) as disjunction and conjunction extended to 
all the objects x composing the given object region M. In the con- 
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structive approach this representation can be used only for the 
heuristic (inductive) reasonings and constructions, but not at all as 
a method of strict formal proof. 

In the expressions gxP(x) and+ yxP(x) the variable x is bound by the 
corresponding quantifier. In contrast with the free (unbound) vari- 


ables, for example the variables x and y in the expression Q(x, y), 


¢ @ 


the bound variables do not have independent (individual) value, since 
the proposition containing the bound variables actually does not de- 
pend on these variables. The role of the bound variables in the pred- 
icate calculus in this sense is completely analogous to the role 
played by the variable index i in the calculation of the sum oY fy OF 
the integration variable x in the calculation of the definite inte- 
gral Glad - We can, in particular, replace the bound variable with 
any other variable without altering the sense or value of the corre— 
sponding expression in so doing. 

Any fcrmula of restricted predicate calculus is constructed with 
the aid of the four operations of propositional calculus (negation, 
disjunction, conjunction and implication) and two coupling operations 
with the aid of the object quantifiers (generality and existensional) 
from elementary propositions, which are usually the familiar variable 
propositions (propositional letters) and the propositional functions 
(predicates) defined above. Here the object region is assumed fixed, 
and the total number of symbols composing any formula must of necessity 
be finite. For unity of terminology the elementary variable proposi- 
tions which do nee depend on the object variables (i.e., the proposi- 
tional letters) are conveniently considered as zero-place predicates 
(prepositional functions of an empty set of object variables). 


Just as in propositional calculus, in the predicate calculus we 


can make use of round brackets for the designation ot’ the order of 
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operations in the formulas. These brackets are also used to establish 
the action region of the quantifiers which appear in the formula. For 
example, in the expression 4:x(P(x,y) > Qx)) > R(*) the action region of 
the existensional quantifier .gx includes only the expression P(x,y) 3 Q(x). 
The variable x appearing in this expression is bound by the indicated 
quantifier, while the variable x in the predicate R(x) must be consid- 
ered as a free variable. 

In order to avoid confusion during the various sorts of trans- 
formations of formulas in predicate calculus, we usually prefer to re- 
designate the bound variables so that their notations differ both 
from one another and from the notations of all the free variables ap- 
pearing in the same formula. In this case we can consider that the ac- 
tion region of each quantifler extends from the place of its occur- 
rence right to the very end of the formula. Thereby the use of brack- 
ets for the designation of action regions can be made superfluous. 
Hereafter we shall adhere, as a rule, to precisely this interpreta- 
tion of the action regions of the quantifiers. 

We note that in restricted predicate calculus only the object 
varibles are permitted to be bound using the quantifiers. Here the 
predicates appearing in a formula are assumed to be unchanging. Such 
a limitation naturally restricts the region of applications of the 
logical calculus which we are constructing, which explains the inclu- 
sion in its name of the term "restricted." In the so-called extended 
predicate calculus use is made of variable predicates and predicate 
quantifiers. In other words, there are permitted expressions of the 
form "P(x) is valid for every predicate P" or "there exists the pred- 
icate Q for which the proposition Q(x) is true" etc. With unlimited 
use of predicate quantifiers there arises the possibility of con- 
struction of internally contradictory formulas and the appearance of 
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paradoxes. All this leads to the necessity for further complication 

of the corresponding calculi. However, we shall not concern ourselves 
with a detailed study of the extended predicate calculus, but shall 
concentrate our attention on the restricted predicate calculus. There- 
fore hereafter when we use the term predicate calculus (unless other- 
wise stipulated we shall always mean the restricted predicate calculus. 
We also shall not consider other possible generalizations of the pred- 
icate calculus, for example predicate calculus with several object re- 
gions rather than only one, etc. 

Just as in the case of propositional calculus, in the construc- 
tion of predicate calculus it is not sufficient to indicate only the 
method of writing the formulas. It is necessary also to give the rules 
for the transformation of the formulas, expressed by axioms. The axi- 
oms of predicate calculus ‘include all 11 axioms of propositional cal- 
culus which were given in §5 of Chapter 2. In addition to them, there 
are introcuced four postulates which are specific for predicate calcu- 


lus and to which we assign the numbers from 12 to 15 inclusive: 


12, CoP) 14, P()D xP (x). 
* CSVxP (x) iA P(x) DC 
13. ¥xP (x) D P(A. * “gxP (x) SC * 


With the aid of these postulates (axioms) we can perform the for- 


mal deduction of new formulas by exactly the same method as in the 


case of propositional calculus. We note only that the expressions P(x) 
and P(t) in axioms 12-15 must be understood not only as elementary 
one=place predicates, but also as any formulas of predicate calculus 
containing the letters x and t as free variables. Here it is not ex- 
cluded that other free variables can appear in the corresponding form- 
ulas. In the formal axiomatic construction of predicate calculus we 


do not usually use the symbols of the individual objects or individual 
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predicates, so that the predicates appearing in the formulas are taken 
to be any, and not fixed, predicates. In place of the propositional 
letters A, B, C in the axioms 1-15 there can be substituted any form- 
ulas of predicate calculsu, including those which contain free vari- 
ables. 

In all the substitutions which we are discussing here it is un- 
derstood that the free and bound variables, and also the various bound 
variables in the formulas obtained as a result of the substitutions, 
must be designated with different letters. This condition permits 
avoiding the so-called collision of variables, which leads to unfor- 
seen binding of variables which must be left free. Actually, if, say, 
in the formula gxP(x) DC in place of C we substitute the formula Q(x), 
then the existensional quantifier 3x would bind not only the variable 
in the predicate P, but also the variable in the predicate Q. Colli- 
gion of variables can always be avoided by means of renaming of the 
bound variables. Hereafter in the case of the necessity of such re- 
naming we shall always assume that it has been accomplished. 

Under the condition that the necessary precuationary measures 
are taken to avoid collision of the variables, all the results on the 
deducibility of some formulas from others obtained previously in pro- 
positional calculus (see the formulas 1-7 in §5 of Chapter 2) are 
transferred over to predicate calculus. The deduction theorem (with 
corresponding stipulations) also remains valid in predicate calculus. 
In particular, if the formula B is deducible (in predicate calculus) 
from the formula A, then the formula A JB (under the condition of use 
of measures to prevent occurrence of collision of variables) will be 


deducible in predicate calculus. 


The following rules of deducibility are easily derived from axi- 


oms 12-15: 
= U27>= 


insertion of generality quantifier (¥ -insertion) 
A(x) -WxA (x); (115) 


insertion of existensional quantifier ( 7 -insertion) 
A (i-FxA (x); (116) 


removal of generality quantifier (y -removal) 


V xA(x)- AGH (117) 


so-called J -removal: if fr, aixy-C » then (see Kleene [42]) . 
P, gxA(xHC. (118) 
The symbol of the variable x written above the deducibility sym- 
bol + means that the corresponding variable is altered (converted 
from a free variable to an apparent variable) in the process of the 
deduction in accordance with the axioms (deduction rules) 12 and 15. 
The free variables which are not altered in the deduction process are 
customarily termed fixed variables. This last concept can be used for 
the refinement of the formulation of the deduction theoren.. 
If there obtains the deducibility of [,A\}-B, and in the deduction 
process the free variables occurring in the formula A remain fixed, 
then there obtains the deducibility of frrearse . | 
Using the equivalency symbol ~ in the same sense as in proposi- 
tional calculus, and using A to denote any formula not containing the 
free variable x, we can easily establish the following relations: 
tVsA~A, + FxA~A; (119) 
LW xVyP (x,y) ~V yW xP (x.y); (120) 
t-F x7 yP (x, y)~ Ty d xP (x.y); (121) 
bY xP(x) 27 xP (x); (122) ® 
Ax yP (x,y) OY yd xP (x,y). (123) 


Rules (120) and (121) show the possibility of variation of the 
order of application of like quantifiers. For the unlike quantifiers 


this situation does not obtain, since the relation yxgyP(x,y) > JyVxP(x.y) » 
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dual to the relation (123), in the general case does not obtain in 
predicate calculus. To convince ourselves of this it is sufficient to 
consider as the object region the set of all natural numbers, and as 
the predicate P(x, y) the predicate which is true if and only if x < y. 
Then the formula yxgyP(xy) expresses the proposition "for every natural 
number there exists the natural number y which is larger than x." At 
the same time the formula gyyxP(x,y) is the proposition "there exists 
a natural number which is larger than all the natural numbers." The 
first prorposition is true and the second is false. Therefore the pro- 
position yrgyP(x,9)> D Fy¥xP(x,y) with the considered interpretation is 
false and must not be deducible in a (contensively) consistent calcu- 
lus. 

We take the contensive consistency of predicate calculus (and 
propositional calculus as well) in the sense that only identically 
true formulas can be deducible in this calculus, i.e., those formulas 
which remain true for any object region and for any concrete inter- ° 
pretation of the predicates occurring in them. Without binding our- 
selves to the requirements of constructivity of arguments (i.e., re- 
maining in the framework of the set-theoretic approach to predicate 
calculus), it is easy to see that the formulas expressed by axioms 13 
and 14, just as the formulas expressed by axioms 1-10 of propositional 
calculus, are identically true formulas. 

The deduction rules (11, 12 and 15) also lead to identically true 
formulas under the condition of identical truth of their premises. As 
an example let us consider the deduction rule le. The identical truth 
of the premise C _.) P(x) can obtain only in two cases: either when the 
proposition C is false, or when the proposition P(x) is always true. 
It is evident that in both of these cases the truth of the proposition 
CD yxP(x). aldo obtains. 
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By similar arguments the contensive consistency of predicate 
calculus is proved (although not completely constructively). The non- 
structivity which is considered here is associated with the implicitly 
assumed possibility of the sorting of all the values of the object 
variables in the determination of theidentical truth, which in the 
case of an infinite object region requires an infinite number of steps, 
which is not in agreement with the requirement of finiteness, manda 
tory for the strictly constructive constructions. 

Limiting ourselves to only the finite object regions, it 1s easy 
to give a completely constructive nature to the proof of the consis— 
tency of predicate calculus. With this limitation the predicate cal- 
culus essentially reduces to propositional calculus, since the quan- 
tifier binding in this case is simply a short form of writing of the 
conjunctions and disjunctions extendec to all the objects of the ob=— 
ject region, and the relations expressed by axioms 12-15 are deductible 
from axioms 1-11. Thanks to the possibility of such an interpretation, 
the question on the consistency of predicate calculus reduces to the 
corresponding question for propositional calculus, which was resolved 
earlier. 

A similar method is used to establish the formal consistency (al- 
so termed simple consistency) of predicate calculus, i.e., the im- 
possibility of deduction in this calculus of any formula together with 
its negation. 

The problem of contensive completeness of predicate calculus, 
4.e., the possibility of formal deduction in this calculus of any 
identically true formula, was resolved in the positive sense by Godel 
[16]. It is obvious that in the case of infinite object regions the 
establishment of the contensive completeness of predicate calculus re- 
quires the use of material which goes beyond the limits of finite 
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mathematics. In the case of finite object regions the question on the 
contensive completeness of predicate caiculus reduces to the corre-~ 
sponding question for predicate calculus and therefore is resolved 
constructively. 

In contrast with contensive completeness, also termed complete-~ 
ness in the broad _ sense, completeness in the narrow sense does not ob- 
tain in predicate calculus. Actually, to the list of axioms of pred- 
icate calculus there can be adjoined the formula 9xP(x) D YxP(x), which 
is not deducible in this calculus and does not lead to the occurence 
of a contradiction. The consistency of the axiom system arising as 
the result of this adjunction becomes clear with consideration of the 
object region consisting of a single object. In this case the newly 
adjoined axiom becomes an identically true formula. At the same tine, 
for the object region which now consists of the two objects x and y, 
this axiom is converted into the formula P(x) v Ply) D> P(x) A Ply), which is 
not identically true and therefore is not deducible from the remain- 
ing axioms. 

With the set-theoretic approach to the construction of predicate 
calculus the following interesting theorem due to Mal'tsev [52] can 
be proved. 

Theorem 1. If an infinite disjunction of (finite) formulas of re- 
stricted predicate calculus is an identically true formula, then the 
finite disjunction of these formulas is identically true. 

This theorem can be used successfully for the proof of the so- 
called local theorems, which in several cases make it possible to 
transfer to the infinite sets the properties which are valid for all 
their finite subsets. 

Along with the identically true formulas, in predicate calculus 
it is useful to consider the so-called satisfiable formulas. Satisfi- 
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able is the term given to a formula which can be made true with the 
selection of a suitable object region and a proper definition of the 
predicates given on it. It is understood that the formulas in question 
here do not contain symbols of the individual objects or individual 
predicates. 

Every identically true formula is moreover satisfilable, but the 
reverse is of course not true in the general case. An example of a 
satisfiable but not identically true formula might be the formula 
#xP(x) 5 VxP(z), which was considered above. This formula is identically 
true only on those object regions which consist of one single object. 

It is clear that a formula which is not satisfiable on some ob- 
Ject region is identically false on this region. Negations of the 
identically true formulas. Thereby there is established the connection 
between the concepts of satisfiability and identical truth of the 
formulas of predicate calculus. 

We can construct examples of formulas which are not satisfiable 
on any finite object regions, but which are satisflable on infinite 
object regions. Moreover the theorem due to Levengeym [Lowenheim] is 
valid. 

Theorem 2. If a formula of predicate calculus is satisfiable on 
some any infinite object region, then it is also satisfiable on an en- 
numerable object region. 

The solvability problem for predicate calculus consists in the 
indication of a single effective technique (algorithm) for the deter- 
mination of the satisfiability or nonsatisfiability of any given forn- 
ula of predicate calculus (on seme object region). In contrast with 
propositional calculus, where the similar algorithm was constructed 
without any difficulty, the problem of solvability in the general case 
for predicate calculus, as shown by Church and Turing, in general has 
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no solution. In other words, there does not exist a single construce 
tive technique for the establishment of the satisfiability or the non-~ 
satisfiability of any formula of predicate calculus. 

Quite frequently the problem of solvability for predicate calcu- 
lus is formulated in a somewhat different form: find the algorithm for 
the determination of the truth (1.e., the identical truth) of any 
given formula in this calculus. 

In view of the contensive completeness of predicate calculus, the 
algorithm which differentiates the true formulas of the calculus from 
the false simultaneously solves the prohlem of the differentiation of 
the provable and unprovable formulas of this calculus. We note also 
that from the truth of any formula there follows the nonsatisfiability 
of its negation. Therefore, if we could decide the question on the sat- 
isfiability or nonsatisfiability of all the formulas of predicate cal- 
culus we would have the possibility of also resolving the question on 
the truth of any formula. Unfortunately, in the general case neither 
the first nor second questions have solutions. 

Thus, with respect to the problem of solvability the predicate 
calculus differs basically from propositional calculus. However, if we 
limit ourselves to certain particular forms of the formulas the deci- 
sion algorithm can be constructed in the case of predicate calculus as 
well. 

Such an algorithm can, for example, be constructed for the form- 
ulas of predicate calculus which contain only single-place predicates. 
This situation is the simple result of the fact that for the establish- 
ment of the satisfiability or nonsatisfilability of a formula contain- 
ing n single-place predicates it is sufficient to limit ourselves to 
the consideration of the object regions consisting of no more than oe 
objects. As a result the verification of the satisfiability (or iden- 
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tical truth) of the predicate formula reduces (after replacement of the 
quantifiers by disjunctions and conjunctions) to the verification of 
the satisfiability (or, correspondingly, the identical truth) of the 
corresponding formula of propositional calculus. 

In the general case, in the resolution of the question on the 
satisfiability or nonsatisfiability of any specific formula it may be 
of considerable assistance to perform a preliminary reduction of this 
formula to the so-called normal form. We shall differentiate two forms 
of normal forms: the so»called prenex form and the Skolem normal form. 
The prenex form is characterized by the fact that all the quantifiers 
(if there are any must be located at the very beginning of the formula 
and the action region of each of them must extend to the end of the 
formula. In the Skolem normal form it is additionally required that all 
the extensional quantifiers precede all the generality quantifiers. 

If the formula is written in the prenex form, then the portion 
standing after the quantifier (the quantifter—frree portion of the . 
formula) can be considered as a formula of propositional calculus 
(each predicate is considered here simply as a variable proposition). 
But then we can exclude all the implication signs in this formula 
(replacing AD}Bby -,A vy B) and then reduce it to the disjunctive 
normal form. A similar transformation reduces the original predicate 
formula to some predicate formula equivalent to it. In many cases the 
concept of the normal form of the predicate formula includes not only 
the condition of prenexing of the quantifiers, but also the mandatory 
reduction of its quantifire-free portion to the ideal disjunctive nor 
mal form. 

The following theorem is valid. 

Theorem 3. For every formula A of (restricted) predicate calculus 
there exists its equivalent formula B written. prenex form. There ex- 
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ists a single constructive technique (algorithm) for reducing any 
(predicate) formula to the prenex form. 
The validity of the formulated theorem follows from the easily 


verifiable relations 


+ AV xP (x) ~ J 2 P (x); (124) 

1 ad xP (x) ~V 2—P (xy (125) 
bQAVxP(x)~V xQAP(x)h (126) 
b QA ag xP (x)~ 7 x1Q A P(x); (127) 
LF QVYV xP (x) ~V x(QV P(x); (128) 
LK QV IxP(x)~F7x(QV P(x). (129) 


Since implication can be replaced by the operations of disjunction 
and negation, with the aid of the aid of the above formulas with ob- 
servation of the conditions which exclude the possibility of the oc- 
currence of collision of thevariables, we can perform the sequential 
permutation of the quantifiers with all the symbols (different from 
the quantifiers) which make up the formula until all the quantifiers 
appear in the left part of the formula. For example, the formula 

P(x) V-VyQ(xy) can be first transformed to its equivalent formula 

P(x)Vay — Q(x) and then to the (also equivalent) formula Jy (P (x)V 
V-. Q(xy)), which then is the required prenex form of the original form- 
ula. 

A direct analogy of theorem 3 does not exist for the Skolem nor- 
mal form: not every formula of predicate calculus has an equivalent 
formula having the Skolem normal forn. 

However the concept of equivalence can be generalized so that any 
formula of predicate calculus can be reduced to the Skolem normal 
form. This generalization os given by the concept of the so-called de- 
ductive equivalence [61]. 

The formula A 4s termed deductively equivalent to the formula B 
if by adjoining formula A to the axiom set of the calculus we obtain 
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the possibility of deducing the formula S) from the thus expanded sys- 
tem of axioms, and, on the other hand, by adjoining formula B to the 
axiom set we obtain the possibility of deducing formula A 

This definition is applicabel not only to predicate calculus, but 
also to any other logical calculus, in particular to propositional 
calculus. Since in propositional calculus the adjoining of any nonde- 
ducible formula to the axiom set makes all the formulas deducible, 
then any two nondeducible formulas of predicate calculus are deduc- 
tively equivalent. It is also clear that any deducible formulas (in 
any calculus ) are deductively equivalent. At the same time, deductive 
equivalence of the deducible and nondeducible formulas is impossible, 
since the adjoining of the firs formula to the axiom system does not 
make the second formula deducible. 

Thus, in propositional calculus both all deducible and all none 
deducible formulas are deductively equivalent. It is also easy to see 
that in predicate calculus (as, moreover, in propositional calculus) 
conventional equivalence of fromulas implies their deductive equiva— 
lence. However, the reverse is not true in general, since, for example $ 
two elementary propositional variables (arbitrary letters) P and Q, 
which are not equivalent to one another, are however deductively 
equivalent. 

The following theorem due to Skolem is valid. 

Theorem 4, For every formula of (restricted) predicate calculus 
there exists its deductively equivalent formula written in the Skolem 
normal form. There exists a single constructive technique (algorithm) 
which permits performing the reduction of any predicate formula to its 
deductively equivalent Skolem form. 

It can be shown that if two formulas are deductively equivalent, 
then the identical truth of one of them implies the identical truth of 
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the other. Since there exists a technique for the reduction of any 
formula of restricted predicate calculus to its deductively equiva- 
lent formula in the Skolem normal form, then with the resolution 

of the problem on the establishment of theidentical truth of partic- 
ular formulas we can replace these formulas by their corresponding 
Skolem normal forms. This situation can also be used for the proof of 
the contensive completeness of predicate calculus, since for that 
proof it is sufficient, in view of what has been said above, to estab- 
lish the deducibility of all the identically true formulas written in 
the Skolem normal form. Actually, by establishing the deducibility of 
all the indicated formulas we thereby establish the deducibility of 
all their deductively equivalent formulas, i.e., all the identically 
true formulas of predicate calculus. 

If a formula of predicate calculus contains free variables, it is 
termed an open formula. Formulas ich do not contain free variables 
are customarily termed closed formulas. If X12 X59 +++ Xp, are all 
the free variables of the open formula A, then the closed formula 
Vx,Vx,..V¥x,% is termed the closure of formula A. Any formula B is de- 
ductively equivalent to its closure B' and therefore these two form- 
ulas are either simultaneously identically true, or are simultaneously 
not identically true. 

If the problem of solvability is taken in the sense of finding 
the algorithm which differentiates the true formulas of predicate cal- 
culus from the false, then the procedure of closure of the fo:mulas, 
just as the procedure of reducing the formulas to the normal form 
(prenex or Skolem) performs the reduction of the general problem of 
solvability to the corresponding problem for the formulas of some 
special form. Of course, this reduction does not aid the solution of 
the problem of the solvability for all formulas of restricted predi- 
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cate calculus. We can, however, identify several quite broad classes 
of formulas for which decision procedures exist. One class of this 
kind (formulas containing only single-place predicates) was considered 
above. 

The decidability problem has a positive solution for the case of 
closed formulas written in the prenex form with either only generality 
quantifiers (A-formulas) or with only existensional quantifiers (E- 
formulas). If we denote the number of these quantifiers by m, then 
the following theorem is valid [1]. 

Theorem 5. For the close A-formulas with m quantifiers truth need 
be established only for the object regions which contain no more than 
m objects. If such of formula is true in the region consisting of m 
objects then it is an identically true formula. 

Theorem 6. A close E-formula is identically true if it is true in 
the object region containing only one single object. If it is ture in 
some region, then it is true also in any other region with a larger 
number of objects. 

The decision procedures for the closed A-formulas and E-formulas 
result directly from these theorems: just as in the case of formulas 
with single-place quantifiers, the finiteness of the object region 
permits reduction of the question on the truth of the predicate form- 
ulas to the question on the truth of the corresponding formulas of 
propositional calculus. 

The decision problem has a positive solution also for all AE-form- 
ulas, 1.e., for those closed formulas of restricted predicate calculus 
in whose prenex normal form all the generality quantifiers precede all 
the existensional quantifiers (in the Skolem normal form the order of 
the quantifiers is reversed). All close AEA-formulas are decidable in 


which the number of existensional quantifiers does not exceed two. 
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We note that in all cases which we have considered here the de- 
cidability was understood in the sense of establishing the truth or 
falsity of the formulas. With transition to the concept of decidabil- 
ity in the sense of establishing the satisfiability or nonsatisfiabil- 
ity of the formulas, decidable classes of formulas are obtained from 
the classes (decidable in the first sense) of formulas listed above by 
the replacement of all the existensional quantifiers by generality 
quantifiers and vice versa. Thus, for example, the class of all EA- 
formulas and the class of all EAE~formulas containing no more than two 
generality quantifiers will be decidable (in the sense of establishing 
the satisfiability or nonsatisfiability). 

A large number of classes of formulas for which the decision pro- 
blem is resolved positively has now been established. The limitations 
used to identify the indicated classes concern not only tne nature, 
number and order of arrangement of the quantifiers, but also the form 
of the quantifier-free parts of the formulas (written in the prenex 
normal form). 

The possibilities have also been investigated of the construction 
of decision procedures beyond the limits of the restricted predicate 
calculus, in particular the procedure for the resolution of certain 
formulas of second degree predicate calculus. In the second degree 
predicate calculus use is made not only of object quantifiers, but al- 
so of predicate quantifiers ("for any predicate P," "there exists the 
predicate P"), however the predicates can depend only on the object 
variables and cannot be included in the system of objects composing 
the object region. 

Second degree predicate calculus in the general form not only is 
not decidable, but also (as shown by Godel) cannot have any complete 
axiom system. Nevertheless, even in this calculus there exist quite 
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broad decidable parts. Such a part, for example, is the so-called AND- 
calculus. In this calculus the object region is the set of all natural 
numbers, and all the predicates are single-place. A corresponding re- 
sult was announced by Byukh, who somewhat earlier constructed a deci- 
sion algorithm for a weakened variant of the AND-calculus which has 
found application in the theory of finite automata. 

§2. FORMAL ARITHMETIC AND THE GODEL THEOREM 

A formal arithemtic can be constructed on the base of the (re- 
stricted) predicate calculus. The objects used for the construction 
of the formal arithmetic are the whole nonnegative numbers 0,1,2,3,.... 
On the set of all such numbers there are determined the conventional 
arithmetic operations of addition and multiplication, and also the 
operation of direct succession a' =a+1 (a = 0,1,2,...). This opera- 
tion gives a method for unique representation of all the natural num- 
bers: | =0', 2=1' =0', 3 =2' =0” etc. 

Arithmetic expressions which are customarily termed measures are 
composed with the aid of these operations from the whole nonnegative 
numbers and variables which run through the whole nonnegative values. 
Examples of such terms are the expressions x, 0', xey' + a"ez. We note 
that in the case of absence of brackets to determine a particular or- 
der of operations, the direct succession operation has the right of 
priority. After it follows the multiplication operation and then addi- 
tion. Thus, for example, the expression xey' + z' must be understood 
as ((x)e(y')) + (z'), and not as anything else. 

By canbining two terms with an equality sing, we obtain a propo- 
sition which is true or false depending on whether the indicated equal- 
ity 1s true or false. All such propositions constitute the set of so- 
called elementary formulas of formal arithmetic. If in the terms com- 
posing the proposition there are variables, then it (this proposition) 
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will be a predicate which is naturally termed an elementary arithmetic 
predicate. From such elementary predicates with the aid of the opera- 
tions of (restricted) predicate calculus (including the operation of 
quantifier binding) there are constructed more complex arithmetic 
predicates. All the predicates which can be consteucted in this way 
are termed Godel arithtmetic predicates. 

The formulas of the formal arithmeitc system which we have con- 
structed are limited to the formulas which can be constructed from the 
elementary formulas with the aid of the operations of (restricted) 
predicate calculus. The axiom system of the formal arithmetic is ob- 
tained by supplementing the axiom system of (restricted) predicate cal- 
culsu (axioms 1-15) by the specific arithmetic axioms: 

16. P(0) AV x((P(x) DP(x’)) DP(x)) (axiom of mathematical induction). 


17. a’ =b’ Da =b. 

18. “ia’ = 0. 

19.a=b D(a =cIDb =). 
20..@ = b D(a’ =b’). 
21.a+0 =a. 

22. a+b’ = (a + db)’. 

23. a-0 = 0. 

24. a-b’ =a-b+a. 


For the proper understanding of these relations it is necessary 
to note that in order to economize brackets a definite order of prior- 
ity of operations is established in the formulas of the formal arith- 
metic system which we have constructed: all arithmetic operations (di- 
rect succession, multiplication and addition) have priority over 
equality, and the latter has priority over all the logical operations. 

Having the system of axioms (including the deduction rules 11, 12 
and 15) we can transfer to the formal arithmetic the concept of (for 
mal) demonstrability (deducibility) and nondeducibility) of the form- 
ulas, and also the concepts of formal deduction, identically true and 


identically false formulas, etc. 
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With the aid of these axioms we can in a rigorously formal manner 
esvablish also the laws of the arithmetic, such as the commutative and 
associative laws for addition and multiplication, the distributive law 
for multiplication with respect to addition, etc. We can prove the 
validity of the relations fa+! =a: }a0' =a, a+b=05a=0/A0- 
=0,-a-b6=05a=0\y; 6 =Qand others. 

Using axiom 16 we can derive the. following general rule for proof 
by the method of induction. 

Let T be a set of formulas of (formal) arithmetic which do not 
contain the variables x as a free variable, and let P(x) be a formula 
in which the variable x occurs fee. Then, if [Tt P(0) and I,P(x) + P(x’) 
(without alteration of the free variables in P(x)), then PP P(x) 

Continuing the arguments in this fashion, we can in a rigorously 
formal fashion prove all the basic theormes and justify all the basic 
proof techniques used in the elementary arithmetic constructed by the 
contensive method. 

The resolution of such basic questions as the consistency and c 
completeness of the axiom system in the case of the formal arithmetic 
is much more complex than in restricted predicate calculus. Thus, for 
the proof of the consistency it is necessary to go beyond the frame- 
work of the strictly finite methods. In general, completeness does not 
obtain for the formal arithmetic system which we have constructed. 
Moreover, incompleteness is retained for any consistent extension of 
this system obtained as a result of supplementing the axiom system we 
have written out with any finite number of compatible (1.e., not lead- 
ing to a contradiction) new axioms. This is the sense of the cele- 
brated Godel theorem on the incompleteness of the arithmetic, which 
forced a new look at the entire problem of the substantiation of math- 


ematics and automatization (on the base of complete formalization) of 
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the process of the deduction of new theorems in deductively constructed 
theories. 

In order to clarify the basic idea of the proof of the theorem on 
the incompleteness of the arithmetic, it is necessary to make a pre- 
liminary acquaintance with several concepts and auxiliary results. 
First of all we must formally define the very concept of completeness. 

To establish the class of deducible (demonstrable) formulas of 
the arithmetic it is sufficient to limit ourselves to the considera- 
tion of only the closed formulas. Actually, as a result of the easily 
verifiable relations P(x,,x3,....  %,) [/ VxW x... x, Plxyts 6 Xn Vx, xD... 
ce Wx_P(X1Xq)....%n) LE P(X1,.%4,....4n) Of predicate calculus, from the demonstra- 
bility of some formula there follows the demonstrabilisy of its clo- 
sure and vice versa. 

A formal arithmetic system is termed (simply) complete if every 
closed formula A is formally decidable, i.e., if one of the formulas 
A or | A is a decidable formula. 

If for the formula A its negation TA is demonstrable, then the 
formula A itself is termed (formally) refutable. Formal decidability 
of a closed formula thus means that this formula is either demonstra- 
ble or refutable. Since from the naive contensional point of view the 
closed form must be either true or false, the condition of its formal 
decidability is a very natural criterion for the resolution of the 
question on the completeness of the corresponding formal system. The 
basic idea of the proof of the theorem on the completeness of the 
arithmetic consists precisely in the actual construction of the for- 
mally undecidable closed form. for this construction we need several 
auxiliary results, which we shall now consider. 

In the contensive sense an arithmetic predicate is any (not 
necessarily constructively defined) predicate on the set of all whole 
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nonnegative numbers. The formal arithmetic system which we have con- 
structed gives a method for the constructive specification of some 
arithmetic predicates. To establish this method let us introduce the 
following definition. 

The arithmetic predicate P(X4 sXp9+++ 9X) is termed numerically 
expressible if there exists the formula Py (Xq 2X 59++ + 9X) of the formal 
arithmetic system which we are considering, not containing any free 
variables other than Ky rXor+++ 2X) and such that for any concrete set 
of n whole nonnegative numbers a> Bore 8, there are satisfied the 
following conditions: 

1) 1f the proposition P(ay585+-+s8,) is true, then bt P,(01.0y...-.0n)s 

2) if the proposition P(a,s855++-s8,) is false, then ~~ 7 Paap. ...,0n). 

In this case we say that the formula Py (X4»Xp2-+-»X,) numerically 
expresses the predicate P(X4»X59-+- 9%). 

It is easy to show that the arithmetic predicates x = y and x < y 
are numerically expressed by the formulas x = y and J2(z’+x=y) re= 
spectively. 

The formula Py (x1 sXp9+++ 2X) which we considered in the example 
just presented is decidable for any concrete set of values of its free 
variables XqaXoeeee sXe The formulas having this property are custom— 
arily termed numerically decidable formulas. The verification of the 
truth of the predicate numerically expressible by such a formula, with 
any set of values of the object variables, can be carried out, in 
light of what has been said, by the constructive method. In the formal 
arithmetic system which we have constructed each formula without var- 
fables is decidable, and each formula without quantifiers is a numer- 
ically decidable formula. 

The concept of numerical expressibility can be established not 
only for the arithmetic predicates, but also for the (contensively de- 
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fined) arithmetic functions (whose values are the whole nonnegative 
numbers). We say that the formula P(X) »Xns0++ aX) of a formal arith- 
metic system numerically represents the arithmetic function [(x,,%,,....%1), 
if for any set of n whole nonnegative numbers the following conditions 
are satisfied: 

}) i f (Qi dg, ... Qn) = B then -- P (a), ag,.... Qn. 8); 

2) HT y(P (aids... .. dn y) AV 2(P (ar, an.... Ay 2) Dz = y)). 

The second of these conditions is the conditions is the condition, 
expressed in predicate calculus language, of the uniqueness of the 
specification of the function f with the aid of the predicate ?P. 

We note that the possibility of effective specification of the 
predicates is not necessarily associated with the use of the apparatus 
of formal arithmetic. Any n=place arithmetic predicate P(X4»X5---%,) 
can be specified with the aid of the n=-place arithmetic function 
@(X4 »Xq9+++ 9X) which takes the value O on all sets of valuer of the 
variables Ky 9Xove+e 9X, ON which the predicate O[sic] is true, and the 
value 1 on all those sets on which the predicate P is false. This func- 
tion is termed the representative function of the considered predicate 
P, A predicate whose representative function is primitive recursive 
or general revursive is termed respectively a primitive recursive or 
general recursive predicate. 

Godel has established the following result. 

Theorem 1. If the arithmetic function (X41 »Xo9+-+ »X,) is primi- 
tive recursive, then the (n + 1)-place predicate @(X4»X55--+sX,) =y 
is Godel arithmetic, i.e., expressible by means of formal arithmetic. 

From this theorem it follows, in particular, that all primitive 
recursive predicates are Godel arithmetic predicates. 

With the aid of theorem 1 we can easily establish the validity of 


the following proposition. 
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Theorem 2. Every primitive recursive predicate is numerically ex- 
pressible in formal arithmetic. 

The following result has been established relative to the general 
recursive predicates of Kleene [42] and Post. 

Theorem 3. With any n > O every general recursive predicate 
P(X, »Xos- ay 2X) can be represented both in the form JyR(x,,%q....0-.-nr¥) 
and in the form VyS(x,2,...,2,y), where the predicates R and S are primi- 
tive recursive. And, onthe other hand, every predicate which is re~ 
presentable in each of these two forms is general recursive, and it 
remains general recursive also in the case when the predicates R and S 
are not primitive recursive, but only general recursive. 

The following theorem due to Kleene [42] is aiso valid. 

Theorem 4, All general recursive predicates are Goldel arithmetic. 

For various sorts of complex constructions and proofs in fornal 
arithmetic it is advisable to introduce a special numbering for all 
its formulas and proofs of these formulas (apparatus of the formal 
arithmetic system which we have constructed). Such a numeration was 
proposed by Godel and therefore is termed Godel numeration. There are 
many different ways of accomplishing this. Let us consider one of them. 

Before defining the numbering of the formulas it is advisable to 
someqhat alter the method of their writing, considering not only the 
formulas themselves, but also their individual parts as formal objects, 
termed entities. Among the elementary entities there are, first, all 
the logical symbols (2, A.V.~i¥.9) , the equality symbol (=), the 
symbols for the arithmetic operations (+, «, '), the zero symbol (0) 
and the two symbols for the designation of the different object vari- 
ables (x, |). To the different object variables x,y,x,... there are 
associated the entities x, (|,x), (|,(|,x)), etc. To the terms and 
formulas of the form r-+er',r=s, AVB, AAB, 1A, VuA(u), guA(u) there are as-~ 
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sociated the respective entities (+.r,s), (7), (=.08), (V.4.8) 6A,A,B), (IA), 

(V.u,A(u)), (J.u,A(u)) . For simplicity of notations the symbols ,,s,A,B,u,A(u) 
and the entities corresponding ot them are not differentiated here. 
Using these definitions, we can sequentially, step by step, construct 
the entities corresponding to the various formulas and their com- 
ponent parts. For example, the formula fyx(0' +x =y) corresponding to 
the entity = (b,x(=, (+(',0),x),(|,x))) » the term 0' + x corresponds to the 
entity (+,(',0),x), the formula x = y corresponds to the entity (=.x,(|, x) 
etc. 

Now let us assign to the elementary entities various odd numbers 


(the Godel numbers of these entities) 


PAV IVFH=_+- "Ok |. 
3.5 7 9 111915 171921232527 


Designating by Po» Py2Pos-s> the sequence of all prime numbers 

(P5 = 2, Py = 3) Po = 5,-.. and ete.) and assuming that the entities 
Ap28,5+++,8, are already associated with the Godel numbers ‘yn... Me 
let us associate the entity (a5sa,5--+5a,) with the Godel number 

pupt..Pa™ . Associating with each formula of the formal arithmetic 
system which we are considering the Godel number of its corresponding 
entity, we obtain the sought Godel numeration for all these formula 
x = y, to which there corresponds the entity (=, x, (|,x)), has the 


227 325 


Godel number 215 .3°5.5 » to the term 0! + x with its correspond- 


ing entity (+,(',0),x) there must be assigned the Godel number 
p1T 3.260 +3°9 525 etc. 

It is easy to see that the Godel number of every nonelementary 
entity is necessarily even. Keeping this fact in mind, and also the 
uniqueness of the expansion of every natural number into prime factors, 
it is not difficult to see that from the Godel number we can uniquely 
recover its corresponding formula. This means that the correspondence 


=i = 


between the formulas of the formal arithmetic system which we are con- 
sidering and their Godel numbers is one-to-one. Thus, in the case of 
necessity we can make use of only their Godel numbers rather than the 
formulas of thie systen. 

By analogy with the Godel numeration of the formuias of the 
arithmetic, we can introduce the Godel numeration for all possible fi- 
nite sequences of such formulas, among which there we will be, in 
particulav, theproofs of all the demonstrable arithmetic formulas. 

For any whole nonnegative number a, understood contensively, we 
shall use & to designate the representation of this number in the for- 
mal arithmetic (2 represents the symbol O with a primes). Fixing some 
Godel numeration of the arithmetic formulas, we shall for any Godel 
number n use P, to denote that formula which has the number n in our 
numeration. We identify in the formula P, the variable x (on which the 
formula actually may not depend), writing the fornula P(x). 

Let us now define the two arithmetic predicates A(a, b) and 
B(a, b), considering the first predicate to be true if and only if the 
number a is a Godel number of the formula P, (x) such that the formula 
P, (&) 1s demonstrable, and the number b is the Godel number of some 
proof of it. 

Similarly the predicate B(a,b) is considered true if and only if 
the number a is a Godel number of the formula P(x) for which the 
fromula P,(&) 1s refutable, and the number b is a Godel number of the 
proof of the formula || P, (&). 

Using the theorems formulated above, we can prove the validity of 
the following important lemma. 

Lemma. The arithmetic predicates A(a, b) and B(a, b) which we 
have defined in the case of the Godel numeration fixed above are nu- 
merically expressible in the formal arithmetic system with the axioms 
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Let us construct the formulas A, (a, b) and B, (a, b) which numer 
ically express the predicates A(a, b) and B(a,b) respectively, and let 
us consider the formula vy iA(xy).. This formulas has the Godel num- 
ber p and therefore coincides with the formula which we agreed to de~ 
note by P(x). Now let us consider the formula P (8) which does not 
have free variables. This formula, in explicit form represented as 
vy Ap. y) =» can be considered as a proposition expressing its intrin- 
sic nondemonstrability. Actually, this proposition is the statement 
that no number can be a Godel number of the proof of the formula which 
is obtained from formula P(x) as a result of the replacement of x by 
p. But this replacement is just what transforms the formula P(x) in- 
to the formula yy—A,(j,y) which we have constructed. 

It is found that with some additional assumptions these proper- 
ties of the formula yy7A,(é.y) imply its undecidability, which then 
proves the incompleteness of the formal arithmetic system which we 
have consteucted. The additional assumptions involved here amount to 
the fact that the formal arithmetic is assumed to be w-consistent. 

By w-consistency of the formal arithmetic system we mean the 
following property: for no formula P(x) for which the formula —yxP(x) 
is demonstrable can it be shown that all formulas of the form P(0O), 
P(1), P(2),... . are demonstrable. 

From the w-consistency of the formal arithmetic there results its 
simple consistency. Actually, let P be any demonstrable formula not 
containing free variables (for example, the formula 0 = 0). Introduc- 
ing into this formula the dummy variable x, on which P actually does 
not depend, we write it in the form P(x). Then all the formulas P(0), 
P(1),... coincide with P and, consequently, are demonstrable. As a re- 


sult of the w-consistency of the system this means that the formula 
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—iVxP(x), actually coinciding with the f:~mula ]P, is nondemonstrable. 
However, with the existence of contradicticn in the sys em, thanks to 
the property of weak + -removal (see formu3a (7) of §5 Chapter 2) all 
the formulas of this system would be demonstrat’e. Since, however, the 
formula + P ig nondemonstrable, our system is (simply) consistent. 

Now we can prove the Godel theorem on the incompleteness of the 
arithmetic in the primitive (weak) fcrm. 

Theorem 5. If the formal arithmetic is w-consistent then formuia 
Vy—1A,(3,y), constructed above is an example of a nondecidable formula. 

Proof. Let us assume first that the formula yy7A,(f,y) is demon- 
strable. We use k to denote the Godel number of the proof of this 
formula. Then the proposition A(p, k) is true and, consequently, the 
formula A, (8; R) is deducible. Using the operation of gy -insertion 
(see §1 of present chapter) we obtain {+ JyA.(é,y), or, using formula 
(125), we obtain - 1¥y iA, y) - But then, as a result of the assump- 
tion made above, the formal arithmetic system which we are considering 
will be (simply) inconsistent, which is excluded in view of the condi- 
tion of its w-consistency. 

Let us now assume that the formula ~ipy7iA,(f,y) 18 demonstrable. 
As has been proved, the formula Yyy1A,(~,y) is nondemonstrable. There- 
fore none of the numbers 0,1,2,... is the Godel number of the proof of 
the latter formula. This means that all the propositions A(p, 0), 
a(p, 1), A(p, 2),... are false, and consequently, in view of the nu- 
merical expressibility of the predicate A, all the formulas of the 
form —A,(pi) are deducible for 1 = 0,1,2,... . But then, on the 
strength on the assumption of the w-consistency of the system, the 
formula -iyy1A,(b. y) 18 nondemonstrable, which contradicts the assump- 
tion we made on its demonstrability. 

Thus, the formula Vy iV,(py) cannot be either a demonstrable or 
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a nondemonstrable formula and, consequently, is an example of the non- 
decidable formula. Thereby the theorem is proved. 

As we see from this proof, for the establishing of the nondemon- 
strability of the formula which we have constructed the assumption on 
the simple consistency of the formal arithmetic system is sufficient, 
and the assumption on the w-consistency of the system is used only for 
the proof of the nonrefutability of this formula. 

Rosser has shown [71] that we can construct an example of a form- 
ula whose nondecidability is established without the assumption on the 
w-consistency of the formal arithmetic. Its simple consistency is suf- 
ficient for this. The formula involved here is constructed as follows. 
First from the predicates A(a, b) and B(a, b) defined above we con- 
struct the formula yyAxy)V J 2lz<yAB,(, 2)))(where the formulas A, and 
By numerically express the predicates A and B respectively). If we 
designate this formula by Py (x) (q is its Godel number) then the forn- 
ula P, (4) is the desired example of a formula whose nondecidability 
is established with the aid of the assumption on the simple consis- 
tency of the formal arithmetic. The validity of this last assumption 
has been established by Ackerman, Neyman with a certain limitation, 
and by Gentzen in the general case. 

Novikov [59] has shown not only the simple consistency but even 
the w-consistency of the arithmetic, although to do this required re- 
sort to methods going beyond the framework of the formal arithmetic 
itself. From this result and theorem 5 there follows the Godel theorem 
on the incompleteness of the arithmetic: 

Theorem 6. The formal arithmetic system with the axioms 1-24 is 
incomplete in the sense that in it there are constant propositions 
(formulas which do not contain free variables) which cannot be proved 
or refuted using the apparatus of this system. 

- 451 - 


We might think that the Godel result uncovers only the insuffi- 
clent completeness of our selected axiom system for the formal arithe- 
metic, and that with suitable supplementing of this system of axioms 
by new axioms the incompleteness of the arithmetic (while retaining 
its consistency) would no longer obtain. In actuality the matter is 
far from geing this simple. As shown by the detailed analysis carried 
out by Godel, with any consistent extension of the axiom system the 
formal arithmetic continues to remain incomplete, and just as before 
there will be in it nondecidable closed formulas. Moreover, every for- 
mal system which satisfies certain quite general conditions (the ex- 
istence of a sufficiently extensive set of formulas and objects), in 
case of its consistency will of necessity be incomplete. 

As mentioned above, the proof of the consistency of the formal 
arithmetic system S which we have constructed required resort to appa= 
ratus which goes beyond the framework of this system. It is found that 
this fact is not chance: it can be shown that the proof of the consis-— 
tency of the system S by the apparatus formalized in this very system 
is not possible. 

Actually, in the system S it is found to be impossible to prove 
the formula 1 = O. In the case of the inconsistency of this system, 
all its formulas and, in particular the formula 1 = 0, become demon 
strablé. The reverse is also true: from the demonstrability of the 
formula 1 = 0 there follows as a corollary the inconsistency of the 
system S. Let r be the Godel number of the formula 1 = 0. Then the 
formula ~iJyA,ly) , which for brevity we denote by A, as a result of 
the definition presented above of the predicate A(x, y) with the nu- 
merical expression A, (x; y), is the formal expression of the consis= 
tency of the system S. 

It can be shown that the formalization (in the system S) of the 
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proof of theorem 5 can be reduced to the deduction (in S) of the form= 
ula “4>Vy —A,(f,y)  » and the formalization (in S) of the prvof of the 
consistency of the system S can be re uced to the deduction (in S) of 
the formula A. But in the case of the existence of both of the indi- 
cated deductions, according to the deduction rule expressed by axiom 
11 of propositional calculus (see Chapter 2 §5), the formula yy—A,(p. ») 
must also be deducible (demonstrable). Since this contradicts theorem 
5, then formula A cannot be demonstrable in the system S, which then 
shows the impossibility of the proof of the consistency of the formal 
arithmetic system using the apparatus of this system itself. 


§3. CONCEPT OF AUTOMATION OF PROOFS AND CONSTRUCTION OF DEDUCTIGE 
THEORIES 


The formal arithmetic system constructed in the preceding chapter 
is an example of the formalization of the mathematical theory on the 
basis of the predicate calculus. Such formalization makes it possible 
to expand into exactly defined elementary component parts the process 
of the proof of all the propositions which are demonstrable in the 
framework of the given theory. By placing in the program of a univer- 
sal electronic digital machine all the axioms and derivation rules of 
the considered theory, and also the formula expressing the proposi- 
tion which is to be proved, we can organize a system of random search 
for the proof of this formula. 

If the number of elementary steps which permit accomplishing the 
proof of the required formula is relatively small, then the high speed 
of operation of the electronic digital machine permits finding the 
proof by the method of simple sorting of all the rariants. However, 
for any complex propositions such a method of search for the proof be- 
comes unsuitable in practice inview of the fact that the number of 
variants to be sorted becomes tremendously large, so that their com 
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plete sorting in a reasonable time is not possible even on the modern 
high-speed electronic digital machines. In these conditions we must 
make use of various sorts of techniques which permit a sharp reduction 
of the number of variants to be sorted. Such techniques include the 
enlarging of the deduction rules, thanks to which the proof is con- 
structed from larger blocks and as a result becomes considerably 
shorter, Another technique for the shortening of the sorting consists. 
in the development of various heuristic methods which make it possible 
to set intermediate goals and thereby break the proof search process 
down into individual stages. Such stages must be small enough so that 
complete sorting within them is possible. 

Usually in the automation of proofs we prefer to make use of a 
formalization system of the predicate calculus which is somewhat dif- 
ferent from that which was developed in the first section of the pre- 
sent chapter. Gentzen proposed such a system of formalization in [15]. 
It permits the normalization in some sense of the process of the proof, 
using the formal apparatus of the predicate calculus. We shall present 
certain basic these of the Gentzen system of formalization of predi- 
cate calculus, which, in contrast with the previously considered so- 
called Hilbert system H, we shall designate by G@ or, more precisely, 
by Gl. 

One of the significant concepts in the Gentzen system is the con- 
cept of the so-called sequence. A sequence is a formal expression of 


the form Aj; Agssses An ~&5,; B wk 6 65h Bi» where A, and B are formulas, 


and the arrow denotea a new formal symbol. The sequence Pe 
+, ®.... 8, has the same interpretation as the formula 4, A% A...A%,, > 
D8, V2, V ..vVB, in the Hilbert system, where the conjunction of an 
empty set of formulas is considered true, and the disj8nction of an 
empty set of formulas is considered false. 
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The part of the sequence standing ot the left of the symbol ~> is 
termed the antecedent, and the part standing to the right of this 
symbol is termed the succeedent of the considered sequence. For brev~ 
ity of writing, the finite sequences of formulas are denoted by the 
capital Greek letters ( f,0e,A, A) etc.) and the individual formulas are 
denoted by the capital Latin lettcis. 


In the Gentzen system G1 there is the natural axiom (axiom scheme) 
C+C, (130) 


and also a whole series of deduction rules which are divided into 
rules of deduction for propositional calculus and additional rules for 
deduction for predicate calculus. 


The deduction rules for propositional calculus: 


ere (> -insertion in succeedent) (131) 
WORE Poe (D -insertion in antecedent) (132) 
Son eae (A -insetion in succeedent) (133) 
: aan ARE oe ( A -insertion in antecedent) (134) 
; 3s ; oe lv: ~insertion in succeedent) (135) 
ee (Vv -insertion in antecedent) (136) 
je (7 insertion in succedent) (137) 
a (O -insertion in antecedent) (138) 

Additional rules of deduction for predicate calculus; 

rT + 6, A(d) 


>, xA(x) (¥v -insertion in succeedent) (139) 


ete re (V -insertion in antecedent ) (140) 
en Sty (3 -insertion in succeedent ( (141) 


A(b) +8 (5 -insertion in antecedent) (i42) 
7 xA (x), +8 
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In rules (139) and (142) there must be observed a definite limita- 
tion which amounts to the following: the variable b must not occur 
free in the conclusions (1.e., in the expressions under the bar) or 
(139) and (142). 

We note that when formula A(x) does not actually contain the free 
variable x, then A(b) coincides with A(x). In this case the variable 
b can be arbitrary so that as b we can always select a variable which 
does not occur in the conclusion and can thereby observe the required 
limitation. 

In addition to the rules above, in the Gentzen system there are 


seven more so-called structural rules of deduction: 


fone (refinement in succeedent ) (143) 
r+e 
FCs (refinement in antecedent) (144) 
P+@C.C (appreviation 1 dent ) (145) 
Tooc (abbreviation in succeeden 5 
“To (abbreviation in antecedent) (146) 
eee (permutation in succeedent ) (147) 
4 
ee (permutation in antecedent ) (148) 
erate en (section) (149) 


For the designation of the demonstrability of the sequence S in 
the system Gl use is made of the abbreviated notation + S, similar 
to the corresponding notation in the Hilbert system H. 

The Gentzen system Gl is in a certain sense of the word equiva- 
lent to the Hilbert system H, since, as shown by Gentezen, the follow- 
ing theorem is valid: 

Theorem 1. If the formula A 1s deducible from the finite set of / 
formulas Tf in the Hilbert system H and all the variables remain fixed, 
then in the Gentzen system G1 the sequence [ + A is deducible. And, on 
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the other hand, if in the system Gl the sequence [A is deducible, 
then the formula A is deducible from the set of formulas I and in this 
case all the variables remain fixed. 

The similarity between the Hilbert and the Gentzen systems is so 
great that if in the deduction performed in one system use is made of 
the rules (axioms ) only for a part of the logical operations c,,\V./.V.4, 
then in the corresponding deduction in the other system we could be 
limited to only the rules with the same symbols, with the possible 
exception of the implication symbol ). 

Gentzen established a result which makes it possible to eliminate 
from the proofs in the system Gl the use of the sections (deduction 
rule (149)). This is the so-called Gentzen theorem on the normal form, 
or the elimination theorem. 

Theorem 2. Let in the system Gl there be given the proof of some 
sequence in which no variable occurs free and bound simultaneously. 
Then in Gl there is a proof of the same sequence which does not use 
the sections (rule (149)) and uses only the logical rules which were 
used in the orginal proof. 

Along with the system G1, Gentzen has also constructed other for- 
mal systems (the systems G2, G3). 

Hao-Wang [77] has used the Gentzen system Gl for the automation 
(with the aid of a universal electronic digital machine) of the pro- 
cess of the proof of a large number of theorems not only from proposi- 
tional calculus, but also from the (restricted) predicate calculus. 
The experiments made by Hao-Wang showed that in spite of the absence 
of a universal decision procedure for the predicate calculus, we can 
construct a partial decision procedure which permits proof of all the 
theorems usually included in a handbook on mathematical logic. 

In the case of propositional calculus, for the proof of a partic- 
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ular sequence the deduction rules (131)-(138) of the Gentzen system 
(supplemented by the rules for the insertion of the equivalence sym- 
bol into the succeedent and the antecedent) are used by Hao-—Wang in 
the reverse direction (the conclusion is replaced by the premise). In 
this case there is performed a sequential (beginning with the left end 
of the sequence) exclusion of the logical connections. As a result of 
the application of this procedure, after a finite number of steps we 
obtain the sequence of the form A, Agseees A, By Bosses Bo» where 
A, and B, are the so-called atomic formulas, i.e., simply speaking, 
the propositional letters. Similar sequences, which are naturally 
termed elementary, are demonsteable if and only if in their left and 
right parts there is encountered the same atomic formula. 

If all the elementary sequences obtained as a result of this pro- 
cedure are demonstrable, then the original sequence is obviously de- 
monstrable. For its proof it is sufficient to repeat all the steps 
which led to the appearance of the indicated elementary sequences, in 
the reverse order. 

If the theorem to be proved is written in the form of a formula 
in the Hilbert system H, then for converting it to a sequence it is 
sufficient to place an arrow in front of it. If the last operation 
performed in the original formula is implication, the formula can be 
converted to a sequence by replacing the corresponding symbol _) by an 
arrow. This method of converting the formula to a sequence usually 
leads to a shorter proof than with the writing of the arrow in front 
of the formula. As a result of the definition of the meaning of the 
symbol ~ in the sequence and theorem 1 of the present section, the 
proof of the sequence obtained by either of the two indicated methods 
4s also the proof of the original formula. 

Let us consider as an exampoe the formula ~((A\.8)>71A_ of propo- 
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sitional calculus. Replacing the implication symbol by the arrow, we 
convert it to the sequence —(AVB)+ iA . The extreme left logical 
connective is the negation symbol ~|. Reversal of the rule for |-in- 
sertion into the antecedent (rule (138)) brings our sequence to the 
form + 7A,AVB . Elimination of the following logical connective 
(which is again negation) leads (with the aid of the reversal of rules 
(147) and (137)) to the sequence A AVB. Finally, reversal of cule 
(135) brings our sequence to the form A-A,B, which is an elementary 
sequence. Since the letter A occurs in both the left and right parts 
of the last sequence the sequence is demonstrable. Writing out in the 
reverse order all the steps which led us to the sequence A ~A,B, we 
come to the proof of the original sequence —(AvB) - A 

For the proper understanding of the last step in the described 
example of sequential elimination of the logical connectives, we note 
that the deduction rule (135) can (as Hao-Wang does) be written 


r+@,A,B 
FrOAWB: (150) 


Similarly in rule (134) for A cinsertion into the antecedent 
the two premises can be replaced by one premise of the form T,A,B, +9 
The legitimacy of these changes of the rules (134) and (135) is easily 
justified with the aid of the rules for refinement in the succeedent 
and antecedent (rules (143) and (144)). 

Of course, for the propositional calculus we can construct more 
effective proof procedures, however the described proof is good in 
that it permits generalization to the case of predicate calculus. In 
this generalization use is made of the technique of elimination of the 
quantifiers with the aid of the reversal of the deduction rules (139)- 
(142), completely analogous to the technique described above for the 
elimination of the logical connectives with the aid of the reversal 
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of the reversal of deduction rules (131)—(138). 

The decision procedure constructed by Hao-Wang encompasses, nat 
urally, only a part of the formulas of predicate calculus (since a 
decision procedure does not exist for predicate calculus as a whole). 
However, it is sufficient to cover almost all the theorems of predi- 
cate calculus included in such a major monograph as Whitehead and 
Russell's Principia Mathematica. 

Improvement of the effectiveness of the decision procedure is 
achieved by r.ans of the use of several additional technique. Among 
these .2chniques an important place is occupied by the reduction of 
the formulas to the so-called minisphere form. In contrast with the 
prenex form in which the region of action of the quantifiers is the 
maximun possible, the minisphere form of the formulas provides for 
the greste:st vossible reduction of the region of action of the quan- 
fifiers. In the case of the <: duction of the formulas to the mini- 
sphere form, the operations of implication and equivalence are usually 
first expressed by means of the operations of disjunction, conjunction 
and negation. In this case the concept of the minisphere form can be 
refined by means of the following operations. 

First, the individual propositional letters and elementary pred- 
icates are minisphere formulas. Second, if the formulas A and B have 
the minisphere form, then the formulas 4vB, ANB and ~ TA also are minis- 
pheric. Third, if P(x) is a disjunction (or, respectively, a conjunction) 
or minispheric formulas, then the formula YyxP(x) (or, correspondingly, 
the formula 4xP(x) ) will also have the minisphere form. Fourth, if the 
formula P(x) in YyxP(x) (or Q(x) in  grxQ(x) ) begins with an existen- 
sional quantifier (or, respectively, with a generality quantifier) and 
the formula P(x) (and Q(x)) is minispheric, then the formula yxP(x) 

(and gxQ(x) ) will also be minispheric. Finally, fifth, a formula 
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which begi.s with a chain of like quantifiers has the minispheric form 
if every formula obtained from it by permutations of these quantifiers 
and dropping the first of them has the minispheric form. 

The procedure for reduction of the formulas of predicate calculus 
to the minisphere form is frequently quite simple. In this case it is 
adavisable to begin the decision procedure with the reduction of both 
parts of the given sequence to the minisphere form and with simulta- 
neous elimination (wherever this is possible) of all the logical con- 
nectives with the aid of the reversion of the deduction rules (131)- 
(138). 

For the elimination of the quantifiers, in place of the applica- 
tion of the (reversed) deduction rules (139)-(142) which requires cer- 
tain limitations, it is frequently advisable to use a simpler method 
based on the concept on the signs of the quantifiers occurring in the 
particular sequence. In the definition of this concept we first con- 
sider the question on the assignment of signs to the various parts of 
the formulas of predicate calculus. First of all, each formula, con- 
sidered as an occurrence in itself, is regarded as positive. If P is 
a positive (negative) part of the formula Q or the formula R, then P 
will be a positive (or, correspondingly, negative) part in the form- 
ulas QAR, QVR, ¥xQ, 9xQ - If D is a positive (negative) part in the 
formula S, then D will be a negative (positive, respectively) part of 
the formulas ]S and S })Q, while D will be a positive (negative, re- 
spectively) part of the formula Q )S. 

If a part of any formula from the Bet of formulas composing a se=- 
quence is considered a part of the sequence, then any part in the se- 
quence will have the same sign as in the corresponding formula if this 
formula occurs in the succeedent, anc the opposite sign if this formula 
occurs in the antecedent of the considered sequence. Every generality 
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Quantifier in the sequence is assigned that sign which its action re- 
gion has in this sequence (considering the action region as a part of 
the sequence). The signs of the existensional quantifiers are consid- 
ered to be opposite to the signa of their action regions. For example, 
in the sequence WxdyPlx,y), “WV Qe) + ¥zR(2)A JuS(u) the quantifiers yz, vu 
and 39 are positive, while the quantifiers Vx and gu are negative. 
In establishing the signs of the quantifiers it is necessary that all 
the variables bound by the quantifiers be pairwise different. Their 
notations must also differ from the notations of all the free vari- 
ables. With satisfaction of these conditions, the following decision 
procedure can be constructed for the sequences which are in the AE- 
form, i.e., consist of formulas in which no existensional quantifier 
can include in its action region generality quantifiers. 

First, all the formulas occurring in the sequence are reduced to 
the minisphere form. Then with the aid of the reversion of rules (131) 
-(138) we eliminate all the logical (propositional) connectives which ie 
permit such elimination. ‘he resulting sequences must be in the AE- 
form (since otherwise the original sequence would not be an AE-se- 
quence). In all these sequences all the quantifiers are omitted, the 
variables bound by the negative quantifiers are replaced by pairwise 
different numbers, and the vairables bound by the positive quantifiers 
are retained without change. 

Again applying the reversion of the rules (131})-(138), we reduce 
the resulting sequences to the elementary form, i.e., to the form not 
containing either quantifiers or propositional logical connectives. 
The true elementary sequences (i.e., those sequences in which there is 
at least formula commor to the antecedent and the succeedent) are 
thrown out. Performing all possible (not necessarily one-to-one) sub- 


stitutions of variables in place of the numbers in the remaining se- 
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quence, we attempt to make all of them true. If this can be done, then 
the original sequence was true, if not, then it was false. 

Let us consider as an example the two sequences: YxP(x) > JyP(y) 3 
and Exp(x) + VyPly) - In the first sequence both quantifiers are negative. 
Therefore the described procedure for the removal of the quantifiers 
reduces it to the form P(1) ~ P(2). Performing the substitution of the 
variable x in place of the numbers 1 and 2, we transform the latter se- 
quence to P(x) ~ P(x). Consequently, the first of the initially given 
sequences is true. Both quantifiers of the second sequence are posi- 
tive. The procedure for the elimination of quantifiers reduces it to 
the form P(x) ~ P(y), which in the general case (for any predicate P 
and a nontrivial object region) is a false sequence. Consequently, the 
second of theoriginal sequences is false. 

These results coincide with the results of the direct verifica- 
tion of the given sequences, which is not difficult to accomplish in 
this case. We note that if, in spite of the condition stipulated above 
in the second sequence both bound variables were designated with the 
same letter, we would come to an incorrect conclusion, taking the se- 
quence to be true. It is also useful to note that the described pro- 
cedure, even without the preliminary reduction of the formulas to the 
minisphere form, is suitable for the resolution of all AE-sequences 
containing no more than one positive quantifier. 

Along with the decision procedure described above for the propo- 
sitional calculus (procedure I), the procedure just described without 
the reduction of the formulas to the minisphere form (procedure II) 
was programmed by Hao-Wang for the IBM-704 universal electronic dig- 
ital machine. 

Using program I, the machine required about three minutes to 
prove all 220 theorems of propositional calculus composing the first 
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five chapters of the monograph Principia Mathematica. The total ma- 
chine operating time (with account for the time for entry of data and 
removal of results) amounted to about 37 minutes. Using program II, 
after about an hour of operation the machine had proved about 130 the- 
orems of predicate calculus from the 158 theorems constituting the 
following five chapters of the same monograph. In all, program II was 
able to prove 139 theorems, although the decision time increased con= 
siderably to do this. 
If we supplement procedure II with the technique for the elimina— 
tion of quantifiers on the basis of the reversion of rules (139)- 
(142) and introduce into it certain preliminary transformations of the 
formulas whic. consititute the original sequence, then all 158 of the 
theorems indicated above become demonstrable. The preliminary trans- 
formations involved here amount to the application (as long as pos- 
sible) of the following replacement rules to the formulas which make 
up the original sequence: 
¥x(P(x) A Q(x)) ls replaced by ¥xP(x) AV «Q(x): (151) 
7x (P(x) V Q(x) is replaced by 7xP(xr) V7xQ(x); (152) 
V x (P(x) D(Q(x) A R(x) 18 replaced by ¥ x (P(x) DQ(x)) A 
AV x (P(x) DR(a)). (153) 
These rules to a certain degree replace the procedure for reduce 
tion of the formulas to the minisphere form, which in the general case 
is quite complex. If after the application of these rules and the 
eliminatim of the logical connectives with the aid of reversion of 
rules (131)-(138) all the resulting sequences are AE-sequences and in 
addition are either minispheric or contain no more than one positive 
quantifier, then solution of the sequence can, as a rule, be carried 
out by procedure II. 


Hao-Wang also proposed further improvements of the described pro= 
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cedures which make it possible to go beyond the limits of just the 
AE-formulas. We note that with the aid of one of these improved pro- 
cedures the IBM-704 machine carried out the proof of 350 theorems from 
the first nine chapters of Principia Mathematica in 8.5 minutes. The 
procedures constructed by Hao-Wang can apparently be easily trans- a 
formed into quasi-decision procedures for the entire restricted pred= 

icate calculus in the sense that they can (after suitable complement.- 

ing) prove any demonstrable formula of this calculus and can refute 

"almost all" the nondemonstrable formulas. The expression "almost all" 

1s understood here in the quite practical sense and cannot, of course, 

be understood as "all, except for a finite number." 

We should underscore the difference between the purely theoretical 
and practical approaches to the solution of the problem of decidabil- 
ity. In the theoretical aspect the prime importance lies in the very 
fact of the existence or nonexistence of the decision procedure for a 
particular class of formulas. The decision procedures which are con- 
structed for this purpose are in the majority of cases completely un- 
suitable for the automation of the proofs of the theorems, since they 
lead to excessively cumbersome and lengthy constructions. 

On the other hand, in the practical approach to the construction 
of the decision procedures particular attention is devoted to the 
questions of the speed and ease of performance of these procedures. 

At the same time we frequently reconcile ourselves to the fact that 
the constructed decision procedure does not encompass absolutely all 
the formulas of the given class, if with its practical application the 
cases when it does not give an answer (after some predetermined tine) 
are relatively infrequent. Thus, the practical decision procedures may 
not be in the exact sense of the word decision procedures, but only 


quasi-decision procedures. 
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Therefore it is not surprising that in practice effective deci- 
Sion procedures can be constructed not only in decidable theories, 
but also in undecidable theories. We shouldnot forget that the human 
being working in a region of some undecidable theory (for example, in 
the arithmetic of the natural numbers) makes use of a finite (and, 
frequently not even very large) number of techniques for the perform 
ance of the proofs and the construction of counter-examples. The task 
of the practical decision procedures is then to formalize these tech- 
niques. 

Of course, the solution of this protien is simplified if the re- 
gion of application of the decision procedure is limited ahead of time 
to some sufficiently narrow region. At the same time the preliminary 
establishing of the theoretical possibility of the solution of the 
problem of decidability in the corresponding region, generally speak- 
ing, does not simplify the problem of the construction of the practical 
decision algorithn. 

Several decision procedures have been constructed for the rel- 
atively simple branches of mathematics (algebra of real polynomials , 
elementary geometry, theory of Abelian groups with a finite number of 
generatrices, etc.). However, these procedures were constructed, as a 
rule, in the purely theoretical aspect, and a considerable amount of 
effort will be required to transform them into practical decision al- 
gorithms. 

Of great interest is the problem of the construction of algorithms 
which would not simply prove or disprove the propositions specified 
by the human but would themselves search out new interesting theorems 
in a particular field. For the construction of this sort of algorithm 
it is necessary to develop sufficiently good criteria for the evalua- 
tion of the degree of nontriviality of a theorem. 
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One of the first attempts in this direction was made by Hao-—Wang 
[77], who constructed a program for the screening (with subsequent 
procf ) of theorems in propositional calculus. This attempt, however, 
was not completely successful, in view of the paucity of the nontriv-—- 
iality criteria included in the program: the machine printed out too 
large a number of theorems without performing adequate screening of the 
uninteresting (trivial) theorems. 

The first nontriviality criterion which usually comes to mind is 
that the nontrivial theorem must be relatively well formulated (be ex- 
pressed by a short formula) and still not have short proofs. The es- 
tablishment of still more natural criteria (in agreement with the con- 
ventional ideas on the nontriviality of theorems) becomes possible if 
the process of the screening of new theorems and their proofs is con 
structe on the principles of self-organization. This can be achieved 
by means of supplementing the original axiom system by nontrivial the- 
orems selected by the program. It is natural to evaluate the complex- 
ity of a theorem on the basis of the minimal number of steps with which 
its proof can be accomplished. We term the original axioms and all the 
theorems whose complexity exceeds some threshold which is selected in 
advance nontrivial propositions. Each newly proved nontrivial propo- 
sition is adjoined to the axiom system, with the result that a re- 
evaluation is made of the complexity of all the previously obtained 
theorems. Excluding from the axiom system the theorems which have be- 
come trivial, we look for a new nontrivial theorem, adjoin it to the 
axiom system, again exclude theorems which have become trivial, etc. 

This self-organizing system for the construction of formal de- 
ductive theories resembles to a considerable degree the process of the 
construction of such theories by the human. We should note that the 
transition to the processes of the construction of the deductive the- 
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orles on the self-organization principles forces a new approach to the 
problem of their decidability. Of course, if the indicated process 
goes on isolated from the outside world, then it is in the final anal- 
ysis equivalent to some "rigid" (non-self-organizing) algorithm, so 
that in the formulation of the decidability problem actually nothing 
ls changed. The same will obviously be true in the case when the pro- 
cess is influenced by some algorithm which is external to it (the case 
of the "constructive external world"). 

In the case of the "nonconstructive external world" when the ex- 
ternal actions on the process which we are considering cannot be re- 
duced to an algorithm, the situation is altered in principle. Actually, 
let us assume that the process in question can accumulate information 
coming from the outside and can perform the comparison of it with the 
formulas of the restricted predicate calculus which it has been given. 
Let us assume further that the information arriving from the outside 
consists of two sequences of formulas of restricted predicate calcu- 
lus, arranged in the order of increasing complexity (evaluated by the 
number of symbols making up the formula). If the first sequence con- 
tains all true, and the second contains all false formulas of predi- 
cate calculsu (which is not impossble in the case of the "nonconstruc-— 
tive medium") then it is not difficult to construct a completely con- 
structive decision procedure for the (restricted) predicate calculus, 
pased on the accumulation of an ever greater and greater quantity of 
external information and comparison of it with the formulas which are 
to be resolved. 

It is obvious that the "nonconstructive medium" is not at all ob- 
ligated to completely take upon itself the decision task, as was act- 
ually the case in the example presented. The nonconstructive sequences 


which it generates may not even be direct sequences of the formulas. 
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They must pc s only one characteristic — the possibility of their 
constructi\ .ansformation (within the framework of the considered 
self-organizing decision procedure) into suitably ordered sequences of 
demonstrable (true) and nondemonstrable (false) formulas of restricted 
predicate calculus. 

It is possible that these considerations may serve in the future 
as the basis for systems for far-reaching automation of the processes 
of scientific creativity, principally the automation of the process of 


the construction of complex deductive theories. 
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